Mutual Assured AI Malfunction (MAIM)

Mutual Assured AI Malfunction (MAIM) is a proposed deterrence regime for ASI risk where unilateral attempts at ASI dominance trigger sabotage by rivals. The AI Safety Atlas (Ch.3.5) treats it as one of the four main asi-safety-strategies, specifically under deterrence approaches.

The Strategic Logic

MAIM resembles nuclear Mutual Assured Destruction (MAD) dynamics but applied to ASI development:

  • Multiple actors monitor each other for ASI development progress
  • If any single actor approaches decisive ASI capability, others trigger pre-arranged sabotage actions
  • The credible threat of mutual sabotage disincentivizes unilateral racing

The aligning insight: MAIM creates incentives aligning national interests with global safety without requiring perfect cooperation — each actor has unilateral incentive to participate because defection is publicly sabotaged.

What MAIM Requires

  • Detection — surveillance and monitoring sufficient to identify when an actor approaches ASI thresholds
  • Escalation ladders — pre-defined responses that can be credibly triggered
  • Strategic infrastructure placement — assets positioned to enable timely sabotage
  • Credibility — actors must believe rivals will follow through

Limitations

The Atlas catalogs structural problems:

Detection Threshold Ambiguity

Unlike nuclear weapons (clear physical signatures: fissile material production, weapon assembly, testing), ASI development lacks clear detection thresholds. When does an AI training run become an ASI attempt? Compute thresholds are imprecise; capability evaluations are imperfect.

Distributed/Concealed Training

Technology could enable ASI development that’s harder to detect than nuclear weapons:

  • Decentralized training over volunteer compute
  • Concealed national programs
  • Air-gapped facilities with off-grid power

Historical Precedent of Inaction

“Historical precedent suggests nations rarely escalate sufficiently for treaty enforcement.” MAD only worked because nuclear weapons are visibly catastrophic; ASI development can be framed ambiguously, and escalation may face domestic political opposition.

Distinction from Other ASI Strategies

StrategyLogicVulnerability
MAIMDeter unilateral racing via threat of sabotageDetection thresholds, escalation credibility
Global Moratorium (global-moratorium)Halt AI development by agreementDefection by single actors
MAIM vs. moratoriumActive deterrence vs. passive agreementDifferent failure modes
Cooperation institutions (CERN, MAGIC)Centralize and governRequires more cooperation than deterrence
Pivotal Acts (pivotal-act)Use first aligned ASI to end risk periodAligning the ASI in the first place

The Yudkowsky Variant

Yudkowsky’s extreme position is sometimes framed as a hard MAIM: completely halt AI research, shut down GPU clusters, limit compute, enforced by military action if necessary to prevent catastrophic scenarios. This trades MAIM’s distributed deterrence for centralized enforcement.

Connection to Wiki

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.