Paul Christiano
Paul Christiano is an ai-alignment researcher widely regarded as one of the most influential technical thinkers in the field. He worked at openai’s machine learning lab before founding the Alignment Research Center (ARC), an independent research organization focused on evaluating and mitigating risks from advanced AI systems. ARC later spun out ARC Evals, which became metr, a leading AI evaluation organization.
Research Contributions
Christiano’s signature contribution is iterative-amplification, a training approach designed to maintain alignment as AI systems become increasingly capable. The core idea is to start from a weak AI that a human can oversee directly, then use copies of the current AI as assistants to help the human act as a more competent overseer. As the AI grows more capable, the human-plus-AI-copies team grows correspondingly more capable as an oversight mechanism. Christiano honestly acknowledges that “by the end of training, the human’s role becomes kind of minimal” — distinguishing his approach from methods that assume indefinite human oversight.
His work on scalable-oversight — the challenge of supervising systems that are more capable than any individual human — has been foundational to subsequent research at both openai and anthropic, including constitutional AI and other recursive oversight methods.
Framing of AI Risk
Christiano frames alignment as a practical engineering challenge rather than an abstract doom scenario: “the problem of building AI systems that are trying to do the thing that we want them to do.” He identifies two layers of difficulty: the philosophical challenge of defining good behavior for very powerful AI, and the distribution shift problem where a seemingly aligned system behaves dangerously in novel situations.
Beyond extinction risk, Christiano emphasizes the subtler danger of “bungling the transition” as humanity passes the torch to AI systems — a framing that expands alignment concern to include value drift and loss of human agency.
Career Philosophy
Christiano argues that safety should not be siloed as a separate team within AI companies; ideally, people involved in AI development should “basically be alignment researchers,” with safety integrated into core development.
Related Pages
- ai-alignment
- iterative-amplification
- scalable-oversight
- openai
- metr
- anthropic
- 80000-hours
- 80k-podcast-paul-christiano
- benjamin-todd
- carl-shulman
- summary-substack-benjamin-todd
- concrete-problems-in-ai-safety
- agi-personal-preparation
- jan-leike
- rob-wiblin
- 80k-podcast-jan-leike-superalignment
- 80k-podcast-olsson-ziegler-ml-engineering