What Is The Alignment Problem?
johnswentworth — 2025-01-16 — LessWrong
Summary
Provides a conceptual framework for understanding what ‘the alignment problem’ means by analyzing which patterns must hold in the environment for concepts like optimization, agency, human values, and corrigibility to make sense at all.
Source
- Link: https://lesswrong.com/posts/dHNKtQ3vTBxTfTPxu/what-is-the-alignment-problem
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- agent-foundations — Theory