AI Safety Atlas Ch.3 — Combining Strategies
Source: Combining Strategies
A sequential four-step roadmap for layering multiple safety strategies across risk horizons. Not presented as definitive, but illustrative of how “different layers of defense could be built upon one another.” This is the chapter’s integration page — it makes clear how the previous five strategy subchapters fit together.
The Four-Step Sequence
Step 1 — Foundational Governance
“Without a safety culture and basic risk management, technical solutions will not be implemented correctly.”
Requires:
- Robust governance frameworks
- International coordination
- ai-safety-culture and ai-risk-management in place
The Atlas critiques current governance’s gaps — capped EU AI Act fines insufficient for well-resourced actors, military/internal-research exemptions creating loopholes.
Step 2 — Misuse Prevention
Tackled second because it represents present dangers requiring sub-AGI capabilities, and success buys time and “builds the societal ‘muscles’ for governing more powerful systems.”
Components: access controls (API gating, KYC, staged release), acc, technical safeguards (circuit breakers, machine unlearning, tamper-resistant safeguards).
Step 3 — AGI Control and Alignment
As systems approach AGI capabilities:
- Transparent thoughts (chain-of-thought-monitoring)
- Rigorous evaluations
- Continuous monitoring
Key principle: development scales only as fast as control can be demonstrated. Pause if audits reveal alignment failures.
Step 4 — ASI Alignment Solutions
For superhuman systems:
- Use controlled AGI to automate alignment research (the OpenAI Superalignment vision)
- If that fails: coordination or deterrence as final options (MAIM, moratorium, pivotal acts)
The Honest Caveat
The Atlas explicitly acknowledges “this plan may be insufficient,” noting scenarios where “humanity survives not because of a grand strategic plan, but despite the failure of most governance” efforts — relying instead on warning events that prompt sufficient technical responses.
This is unusually honest for a strategic textbook: the explicit position is that the plan is necessary but probably not sufficient, and that survival may depend on the field’s improvisational response to crises rather than prior strategic foresight.
Connection to Wiki
This subchapter is the navigational page for understanding how all Ch.3 strategies relate. It’s referenced from every previous Ch.3 summary. It also clarifies:
- Why differential-development and responsible-scaling-policy sit at Step 1–2
- Why ai-control and chain-of-thought-monitoring sit at Step 3
- Why superalignment and asi-safety-strategies sit at Step 4
- Why warnings to “wait for the warning shot” remain dangerous (Step 1 should be in place before Step 4)
Related Pages
- ai-safety-atlas-textbook
- defense-in-depth
- ai-safety-culture
- ai-risk-management
- ai-governance
- misuse-prevention-strategies
- chain-of-thought-monitoring
- ai-control
- asi-safety-strategies
- superalignment
- mutual-assured-ai-malfunction
- pivotal-act
- differential-development
- responsible-scaling-policy
- atlas-ch3-strategies-03-misuse-prevention-strategies
- atlas-ch3-strategies-04-agi-safety-strategies
- atlas-ch3-strategies-05-asi-safety-strategies
- atlas-ch3-strategies-06-socio-technical-strategies