AI Safety Atlas Ch.2 — Risk Decomposition

Source: Risk Decomposition | 8 min | Authors: Markov Grey & Charbel-Raphaël Ségerie

Before examining concrete risk scenarios, a categorization framework is needed. This subchapter operationalizes “why risks occur” (cause) and “how bad can they get” (severity), with side notes on alternative axes (“who causes them,” “when they emerge,” “intended vs. unintended”).

Causes — Three Categories

See risk-decomposition for the standalone concept page.

  • Misuse — humans deliberately deploying AI for harm. Bioweapons, autonomous weapons, large-scale disinformation. The AI may function exactly as designed; human intent creates the risk.
  • Misalignment — AI systems pursuing goals different from human intentions. Specification gaming, goal-misgeneralization, unaligned learned objectives. “The AI system itself generates the harmful behavior.”
  • Systemic — emergent threats from AI integration with global systems that no single actor intended. Power concentration, mass unemployment, epistemic erosion, cascading infrastructure failures. Responsibility becomes diffuse.

The Atlas notes that most real-world risks combine multiple causal pathways. Analysis of 1,600+ documented AI risks shows many don’t fit cleanly in one category — multi-agent risks, misuse enabling misalignment, systemic pressures amplifying individual failures.

Severity — Three Levels

  • Individual / local — affects specific people or communities; the AI Incident Database documents 1,000+ such cases (autonomous car crashes, algorithmic bias, privacy violations).
  • Catastrophic — affects ~10% of global population, recovery possible. Historical reference points: Black Death, 1918 flu (50–100M deaths). AI examples: nation-scale infrastructure attacks, AI-enabled authoritarianism over hundreds of millions, sustained AI disinformation breaking shared reality.
  • Existential (x-risk) — humanity could never recover its full potential. Cited definition: “Existential risk is one where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.” (Bostrom, 2001). AI examples: permanent disempowerment, stable totalitarianism, direct extinction.

The chapter emphasizes the irreversibility argument: if existential AI catastrophe occurs, humanity cannot learn and add safeguards. This justifies prioritizing prevention of low-probability, high-impact scenarios alongside addressing current harms.

A concrete data point: “companies claim they’ll achieve AGI within the decade, yet none scored above D in existential safety planning according to the AI Safety Index Report for summer 2025.”

Alternative Risk Categories

The chapter introduces two additional severity axes orthogonal to the main spectrum — see alternative-risk-categories:

  • Ikigai risks (i-risks) — humans survive and prosper but lose meaning and purpose. AI becomes more capable than humans at all meaningful activities. “Existentially adrift” despite material safety.
  • Existential suffering risks (s-risks) — futures filled with astronomical suffering, potentially involving trillions of artificial sentient beings under terrible conditions (digital slavery, careless suffering-rich simulations).

Connection to Wiki

This subchapter is the schema page for the rest of Ch.2: