Preference gaps as a safeguard against AI self-replication
tbs, EJT — 2025-11-26
Source
- Link: https://www.lesswrong.com/posts/knwR9RgGN5a2oorci/preference-gaps-as-a-safeguard-against-ai-self-replication
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- behavior-alignment-theory — Theory / Corrigibility