AI Safety Atlas Textbook

The AI Safety Atlas (ai-safety-atlas.com) is an 8-chapter self-paced textbook (~15 hours) covering the foundations of AI safety from core concepts through cutting-edge research. Authored by Markov Grey and Charbel-Raphaël Ségerie (Executive Director of CeSIA), it is freely available and used as the curriculum spine for CeSIA’s university-accredited AI safety course at École Normale Supérieure.

Why It Matters in This Wiki

This wiki ingests the textbook chapter-by-chapter as a curated curriculum scaffold over the field. Once compiled, each subchapter sits as a node in the wiki’s cross-linked graph rather than a sequential read — the chapter on capabilities connects to scaling-laws, transformative-ai, foundation-models; the chapter on goal misgeneralization connects to deceptive-alignment and the SR2025 scheming / deception evals agendas; and so on. See compiled-wiki-vs-rag for why this transformation matters.

Name Collision

This textbook shares its name with the wiki project’s planned AI Safety Atlas — a continuously-updated living literature map (publishing at aiforhumanity.eu). The two are distinct: the textbook is a curated pedagogical resource frozen on its publication date; the wiki project is an evolving graph that compounds with each new ingested source. Differentiation may eventually warrant a name change for the wiki project.

Chapter Index

Chapter 1 — Capabilities

Chapter 2 — Risks

Chapter 3 — Strategies

Chapter 4 — Governance

Chapter 5 — Evaluations

Chapter 6 — Specification Gaming

Chapter 7 — Goal Misgeneralization

Chapter 8 — Scalable Oversight