Paul Christiano

Paul Christiano is an ai-alignment researcher widely regarded as one of the most influential technical thinkers in the field. He worked at openai’s machine learning lab before founding the Alignment Research Center (ARC), an independent research organization focused on evaluating and mitigating risks from advanced AI systems. ARC later spun out ARC Evals, which became metr, a leading AI evaluation organization.

Research Contributions

Christiano’s signature contribution is iterative-amplification, a training approach designed to maintain alignment as AI systems become increasingly capable. The core idea is to start from a weak AI that a human can oversee directly, then use copies of the current AI as assistants to help the human act as a more competent overseer. As the AI grows more capable, the human-plus-AI-copies team grows correspondingly more capable as an oversight mechanism. Christiano honestly acknowledges that “by the end of training, the human’s role becomes kind of minimal” — distinguishing his approach from methods that assume indefinite human oversight.

His work on scalable-oversight — the challenge of supervising systems that are more capable than any individual human — has been foundational to subsequent research at both openai and anthropic, including constitutional AI and other recursive oversight methods.

Framing of AI Risk

Christiano frames alignment as a practical engineering challenge rather than an abstract doom scenario: “the problem of building AI systems that are trying to do the thing that we want them to do.” He identifies two layers of difficulty: the philosophical challenge of defining good behavior for very powerful AI, and the distribution shift problem where a seemingly aligned system behaves dangerously in novel situations.

Beyond extinction risk, Christiano emphasizes the subtler danger of “bungling the transition” as humanity passes the torch to AI systems — a framing that expands alignment concern to include value drift and loss of human agency.

Career Philosophy

Christiano argues that safety should not be siloed as a separate team within AI companies; ideally, people involved in AI development should “basically be alignment researchers,” with safety integrated into core development.

ai-alignment
iterative-amplification
scalable-oversight
openai
metr
anthropic
80000-hours
80k-podcast-paul-christiano
benjamin-todd
carl-shulman
summary-substack-benjamin-todd
concrete-problems-in-ai-safety
agi-personal-preparation
jan-leike
rob-wiblin
80k-podcast-jan-leike-superalignment
80k-podcast-olsson-ziegler-ml-engineering

AI Safety Compendium

Explorer

Paul Christiano

Paul Christiano

Research Contributions

Framing of AI Risk

Career Philosophy

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Paul Christiano

Paul Christiano

Research Contributions

Framing of AI Risk

Career Philosophy

Related Pages

Graph View

Graph view

Table of Contents

Backlinks