Concrete Problems in AI Safety

Concrete Problems in AI Safety is a 2016 paper by Dario Amodei, Chris Olah, Jacob Steinhardt, paul-christiano, John Schulman, and Dan Mané (arXiv:1606.06565). It is one of the earliest and most influential technical AI safety research agendas — the paper that helped translate philosophical concerns about advanced AI into concrete problems machine-learning researchers could work on today, with current systems.

Five Concrete Problems

The paper identified five practical research problems framed around an autonomous cleaning robot:

  1. Avoiding Negative Side Effects — how to specify objectives that don’t reward disrupting the environment in unintended ways while pursuing a stated goal.
  2. Avoiding Reward Hacking — preventing systems from gaming a reward function (e.g. covering a mess instead of cleaning it).
  3. Scalable Oversight (scalable-oversight) — efficiently using limited human supervision when full evaluation is impractical.
  4. Safe Exploration — exploring the environment without catastrophic actions during learning.
  5. Robustness to Distributional Shift (distribution-shift) — recognizing and behaving cautiously when the test distribution differs from training.

Why It Mattered

Before this paper, technical AI safety was associated with the MIRI/LessWrong community and largely framed around long-term superintelligence concerns. Concrete Problems demonstrated that the same problem space could be attacked using mainstream ML methods on present-day systems. Its co-authors went on to lead foundational AI safety teams:

  • Dario Amodei: later co-founded Anthropic (2021) and is its CEO.
  • Chris Olah: pioneer of mechanistic interpretability; co-founded Anthropic.
  • paul-christiano: developed iterative-amplification and rlhf; founded ARC and METR (metr).
  • John Schulman: co-author of Unsolved Problems in ML Safety (2021); co-founder of OpenAI; later joined Anthropic.
  • Jacob Steinhardt: Berkeley researcher in robustness and trustworthy ML.

The author list became a kind of leading indicator for the field’s institutional trajectory.

Relationship to Unsolved Problems in ML Safety (2021)

The 2021 follow-up by Hendrycks, Carlini, Schulman, and Steinhardt updated the agenda with four problem areas — robustness, monitoring, alignment, systemic safety — that now structure the ai-safety research taxonomy on Wikipedia.

Connection to This Wiki