Concrete Problems in AI Safety

Concrete Problems in AI Safety is a 2016 paper by Dario Amodei, Chris Olah, Jacob Steinhardt, paul-christiano, John Schulman, and Dan Mané (arXiv:1606.06565). It is one of the earliest and most influential technical AI safety research agendas — the paper that helped translate philosophical concerns about advanced AI into concrete problems machine-learning researchers could work on today, with current systems.

Five Concrete Problems

The paper identified five practical research problems framed around an autonomous cleaning robot:

Avoiding Negative Side Effects — how to specify objectives that don’t reward disrupting the environment in unintended ways while pursuing a stated goal.
Avoiding Reward Hacking — preventing systems from gaming a reward function (e.g. covering a mess instead of cleaning it).
Scalable Oversight (scalable-oversight) — efficiently using limited human supervision when full evaluation is impractical.
Safe Exploration — exploring the environment without catastrophic actions during learning.
Robustness to Distributional Shift (distribution-shift) — recognizing and behaving cautiously when the test distribution differs from training.

Why It Mattered

Before this paper, technical AI safety was associated with the MIRI/LessWrong community and largely framed around long-term superintelligence concerns. Concrete Problems demonstrated that the same problem space could be attacked using mainstream ML methods on present-day systems. Its co-authors went on to lead foundational AI safety teams:

Dario Amodei: later co-founded Anthropic (2021) and is its CEO.
Chris Olah: pioneer of mechanistic interpretability; co-founded Anthropic.
paul-christiano: developed iterative-amplification and rlhf; founded ARC and METR (metr).
John Schulman: co-author of Unsolved Problems in ML Safety (2021); co-founder of OpenAI; later joined Anthropic.
Jacob Steinhardt: Berkeley researcher in robustness and trustworthy ML.

The author list became a kind of leading indicator for the field’s institutional trajectory.

Relationship to Unsolved Problems in ML Safety (2021)

The 2021 follow-up by Hendrycks, Carlini, Schulman, and Steinhardt updated the agenda with four problem areas — robustness, monitoring, alignment, systemic safety — that now structure the ai-safety research taxonomy on Wikipedia.

Connection to This Wiki

Central reference in the history section of the ai-safety page.
Foreshadows multiple wiki pages: scalable-oversight, distribution-shift, robustness, reward hacking (covered within ai-alignment and deceptive-alignment).
Cited in ai-safety as one of the first technical safety agendas.

AI Safety Compendium

Explorer

Concrete Problems in AI Safety

Concrete Problems in AI Safety

Five Concrete Problems

Why It Mattered

Relationship to Unsolved Problems in ML Safety (2021)

Connection to This Wiki

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Concrete Problems in AI Safety

Concrete Problems in AI Safety

Five Concrete Problems

Why It Mattered

Relationship to Unsolved Problems in ML Safety (2021)

Connection to This Wiki

Related Pages

Graph View

Graph view

Table of Contents

Backlinks