AI Safety Atlas Ch.3 — Combining Strategies

Source: Combining Strategies

A sequential four-step roadmap for layering multiple safety strategies across risk horizons. Not presented as definitive, but illustrative of how “different layers of defense could be built upon one another.” This is the chapter’s integration page — it makes clear how the previous five strategy subchapters fit together.

The Four-Step Sequence

Step 1 — Foundational Governance

“Without a safety culture and basic risk management, technical solutions will not be implemented correctly.”

Requires:

Robust governance frameworks
International coordination
ai-safety-culture and ai-risk-management in place

The Atlas critiques current governance’s gaps — capped EU AI Act fines insufficient for well-resourced actors, military/internal-research exemptions creating loopholes.

Step 2 — Misuse Prevention

Tackled second because it represents present dangers requiring sub-AGI capabilities, and success buys time and “builds the societal ‘muscles’ for governing more powerful systems.”

Components: access controls (API gating, KYC, staged release), acc, technical safeguards (circuit breakers, machine unlearning, tamper-resistant safeguards).

Step 3 — AGI Control and Alignment

As systems approach AGI capabilities:

Transparent thoughts (chain-of-thought-monitoring)
Rigorous evaluations
Continuous monitoring

Key principle: development scales only as fast as control can be demonstrated. Pause if audits reveal alignment failures.

Step 4 — ASI Alignment Solutions

For superhuman systems:

Use controlled AGI to automate alignment research (the OpenAI Superalignment vision)
If that fails: coordination or deterrence as final options (MAIM, moratorium, pivotal acts)

The Honest Caveat

The Atlas explicitly acknowledges “this plan may be insufficient,” noting scenarios where “humanity survives not because of a grand strategic plan, but despite the failure of most governance” efforts — relying instead on warning events that prompt sufficient technical responses.

This is unusually honest for a strategic textbook: the explicit position is that the plan is necessary but probably not sufficient, and that survival may depend on the field’s improvisational response to crises rather than prior strategic foresight.

Connection to Wiki

This subchapter is the navigational page for understanding how all Ch.3 strategies relate. It’s referenced from every previous Ch.3 summary. It also clarifies:

Why differential-development and responsible-scaling-policy sit at Step 1–2
Why ai-control and chain-of-thought-monitoring sit at Step 3
Why superalignment and asi-safety-strategies sit at Step 4
Why warnings to “wait for the warning shot” remain dangerous (Step 1 should be in place before Step 4)

AI Safety Compendium

Explorer

AI Safety Atlas Ch.3 — Combining Strategies

AI Safety Atlas Ch.3 — Combining Strategies

The Four-Step Sequence

Step 1 — Foundational Governance

Step 2 — Misuse Prevention

Step 3 — AGI Control and Alignment

Step 4 — ASI Alignment Solutions

The Honest Caveat

Connection to Wiki

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

AI Safety Atlas Ch.3 — Combining Strategies

AI Safety Atlas Ch.3 — Combining Strategies

The Four-Step Sequence

Step 1 — Foundational Governance

Step 2 — Misuse Prevention

Step 3 — AGI Control and Alignment

Step 4 — ASI Alignment Solutions

The Honest Caveat

Connection to Wiki

Related Pages

Graph View

Graph view

Table of Contents

Backlinks