Risk Decomposition

A two-dimensional framework — drawn from the AI Safety Atlas (Ch.2) — for categorizing AI risks. The first dimension is cause (why risks occur); the second is severity (how bad they get). Real-world risks usually combine multiple causes and span severity levels.

Causes — Three Categories

Risk classification by causal responsibility identifies intervention points.

Misuse Risks

Humans deliberately deploy AI to cause harm. The AI may function exactly as designed; human intent creates the risk. Examples:

  • Bioweapon design and DNA-synthesis evasion
  • AI-generated malware, deepfakes, prompt-injection attacks
  • Autonomous weapons systems
  • Large-scale disinformation campaigns

Substantially developed in the wiki’s biosecurity, autonomous-weapons, ai-military-applications pages.

Misalignment Risks

AI systems pursue goals different from human intentions. Three sub-mechanisms:

  • Specification failures — wrong training signal
  • Generalization failures — correct signal, wrong learned objective (goal-misgeneralization)
  • Instrumental subgoals — self-preservation, power-seeking emerging from optimization

Treated in ai-alignment, deceptive-alignment, instrumental-convergence.

Systemic Risks

Emergent threats from AI integration with complex global systems. No single actor intends the harm; responsibility is diffuse. Examples:

  • Power concentration via foundation-model centralization
  • Mass unemployment from broad task automation
  • epistemic-erosion from AI-generated content flooding information ecosystems
  • value-lock-in when AI deeply embedded in society

See systemic-risks for the consolidated treatment.

Real Risks Combine Categories

Atlas analysis of 1,600+ documented AI risks shows most don’t fit cleanly in one category — multi-agent risks, misuse enabling misalignment, systemic pressures amplifying individual failures.

Severity — Three Levels

Individual / Local

Specific people or communities affected. The AI Incident Database documents 1,000+ cases — autonomous car crashes, hiring algorithm bias, privacy leakage, targeted misinformation. Already happening.

Catastrophic

Affects ~10% of global population; recovery possible. Historical reference points: Black Death (one-third of Europe), 1918 flu (50–100M deaths). AI-relevant scenarios: nation-scale infrastructure attacks, AI-enabled authoritarianism, sustained AI disinformation breaking shared reality.

Existential

Humanity could never recover its full potential. Cited definition (Bostrom 2001): “an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.” AI examples: ai-takeover-scenarios, stable-totalitarianism, direct extinction.

The irreversibility argument: existential outcomes preclude learning from failure. This justifies preventative attention to low-probability, high-impact scenarios. See near-term-harms-vs-x-risk for the strategic debate this generates.

Alternative Axes

The Atlas notes other useful classification axes (not used as primary but complementary):

  • Who — humans vs. AI systems vs. emergent multi-agent dynamics
  • When — development time vs. deployment time
  • Intended vs. unintended outcomes

And two off-axis severity types in alternative-risk-categories:

  • i-risks (ikigai) — humans survive but lose meaning
  • s-risks (suffering) — astronomical suffering futures

Connection to Wiki

This page is the navigational schema for Ch.2 of the textbook and for the wiki’s risk-landscape pages:

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.