AI Safety Atlas Ch.3 — Introduction
Source: Strategies — Introduction | Authors: Markov Grey & Charbel-Raphaël Ségerie | Updated Summer 2025 | 3 min
The chapter lays out the big picture of AI safety strategy to mitigate the risks explored in Ch.2, organized into three primary categories with a defense-in-depth philosophy.
Three Strategy Families
- Misuse prevention — access controls and technical safeguards limiting harmful applications
- AGI/ASI safety — alignment and control measures for advanced systems
- Socio-technical interventions — governance, security, culture applicable across all categories
The thesis: “a comprehensive approach that combines many of these strategies” works better than isolated implementations — the defense-in-depth framework. See atlas-ch3-strategies-07-combining-strategies for the integrated four-step sequence.
Scope Limitations
Explicitly excluded — important to note for what the chapter does not address:
- AI-generated misinformation and deepfakes (covered partially in epistemic-erosion)
- Data privacy concerns
- Standard cybersecurity practices
- Bias and toxicity issues
- AI welfare considerations
- Capability gaps unrelated to misalignment
The focused scope reflects the safety community’s emphasis on existential and large-scale catastrophic risks from advanced, potentially agentic AI systems.
Connection to Wiki
Ch.3 maps onto and substantially deepens existing wiki strategy pages:
- ai-control — the Atlas frames it as Greenblatt et al. 2024
- differential-development — the strategic counter to race dynamics
- responsible-scaling-policy — Anthropic’s if-then framework
- superalignment — automate alignment research strategy
- scalable-oversight — chain-of-thought monitoring is one operationalization