AI Safety Atlas Ch.3 — Combining Strategies

Source: Combining Strategies

A sequential four-step roadmap for layering multiple safety strategies across risk horizons. Not presented as definitive, but illustrative of how “different layers of defense could be built upon one another.” This is the chapter’s integration page — it makes clear how the previous five strategy subchapters fit together.

The Four-Step Sequence

Step 1 — Foundational Governance

“Without a safety culture and basic risk management, technical solutions will not be implemented correctly.”

Requires:

The Atlas critiques current governance’s gaps — capped EU AI Act fines insufficient for well-resourced actors, military/internal-research exemptions creating loopholes.

Step 2 — Misuse Prevention

Tackled second because it represents present dangers requiring sub-AGI capabilities, and success buys time and “builds the societal ‘muscles’ for governing more powerful systems.”

Components: access controls (API gating, KYC, staged release), acc, technical safeguards (circuit breakers, machine unlearning, tamper-resistant safeguards).

Step 3 — AGI Control and Alignment

As systems approach AGI capabilities:

Key principle: development scales only as fast as control can be demonstrated. Pause if audits reveal alignment failures.

Step 4 — ASI Alignment Solutions

For superhuman systems:

  • Use controlled AGI to automate alignment research (the OpenAI Superalignment vision)
  • If that fails: coordination or deterrence as final options (MAIM, moratorium, pivotal acts)

The Honest Caveat

The Atlas explicitly acknowledges “this plan may be insufficient,” noting scenarios where “humanity survives not because of a grand strategic plan, but despite the failure of most governance” efforts — relying instead on warning events that prompt sufficient technical responses.

This is unusually honest for a strategic textbook: the explicit position is that the plan is necessary but probably not sufficient, and that survival may depend on the field’s improvisational response to crises rather than prior strategic foresight.

Connection to Wiki

This subchapter is the navigational page for understanding how all Ch.3 strategies relate. It’s referenced from every previous Ch.3 summary. It also clarifies: