Why do misalignment risks increase as AIs get more capable?

Ryan Greenblatt — 2025-04-11 — Anthropic — LessWrong

Summary

Conceptual framework identifying three mechanisms by which misalignment risks increase with AI capabilities: increased likelihood of egregious misalignment, greater affordances and usage, and enhanced ability to acquire power. Includes concrete risk estimates for different capability levels and implications for trustedness evaluations.

Source