Concrete Problems in AI Safety
Concrete Problems in AI Safety is a 2016 paper by Dario Amodei, Chris Olah, Jacob Steinhardt, paul-christiano, John Schulman, and Dan Mané (arXiv:1606.06565). It is one of the earliest and most influential technical AI safety research agendas — the paper that helped translate philosophical concerns about advanced AI into concrete problems machine-learning researchers could work on today, with current systems.
Five Concrete Problems
The paper identified five practical research problems framed around an autonomous cleaning robot:
- Avoiding Negative Side Effects — how to specify objectives that don’t reward disrupting the environment in unintended ways while pursuing a stated goal.
- Avoiding Reward Hacking — preventing systems from gaming a reward function (e.g. covering a mess instead of cleaning it).
- Scalable Oversight (scalable-oversight) — efficiently using limited human supervision when full evaluation is impractical.
- Safe Exploration — exploring the environment without catastrophic actions during learning.
- Robustness to Distributional Shift (distribution-shift) — recognizing and behaving cautiously when the test distribution differs from training.
Why It Mattered
Before this paper, technical AI safety was associated with the MIRI/LessWrong community and largely framed around long-term superintelligence concerns. Concrete Problems demonstrated that the same problem space could be attacked using mainstream ML methods on present-day systems. Its co-authors went on to lead foundational AI safety teams:
- Dario Amodei: later co-founded Anthropic (2021) and is its CEO.
- Chris Olah: pioneer of mechanistic interpretability; co-founded Anthropic.
- paul-christiano: developed iterative-amplification and rlhf; founded ARC and METR (metr).
- John Schulman: co-author of Unsolved Problems in ML Safety (2021); co-founder of OpenAI; later joined Anthropic.
- Jacob Steinhardt: Berkeley researcher in robustness and trustworthy ML.
The author list became a kind of leading indicator for the field’s institutional trajectory.
Relationship to Unsolved Problems in ML Safety (2021)
The 2021 follow-up by Hendrycks, Carlini, Schulman, and Steinhardt updated the agenda with four problem areas — robustness, monitoring, alignment, systemic safety — that now structure the ai-safety research taxonomy on Wikipedia.
Connection to This Wiki
- Central reference in the history section of the ai-safety page.
- Foreshadows multiple wiki pages: scalable-oversight, distribution-shift, robustness, reward hacking (covered within ai-alignment and deceptive-alignment).
- Cited in ai-safety as one of the first technical safety agendas.