Why do misalignment risks increase as AIs get more capable?
Ryan Greenblatt — 2025-04-11 — Anthropic — LessWrong
Summary
Conceptual framework identifying three mechanisms by which misalignment risks increase with AI capabilities: increased likelihood of egregious misalignment, greater affordances and usage, and enhanced ability to acquire power. Includes concrete risk estimates for different capability levels and implications for trustedness evaluations.
Source
- Link: https://lesswrong.com/posts/NDotm7oLHfR56g4sD/why-do-misalignment-risks-increase-as-ais-get-more-capable
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-evals — Evals