If-Then Commitments

If-then commitments are a governance pattern where specific safety measures are pre-committed to be triggered when AI systems reach predefined capability thresholds. Rather than fixed rules that age poorly as capabilities advance, if-then commitments adapt to actual demonstrated capability — making them well-suited to the adaptive governance paradigm.

The AI Safety Atlas introduces the term in Ch.3.6 (socio-technical strategies) and elaborates in Ch.4.

The Pattern

IF [model crosses capability threshold X]
THEN [additional safety measures Y must be in place]
ELSE [development pauses until measures are achievable]

Examples:

  • IF model can uplift novice bioweapon synthesis → THEN deploy WSL3+ weight security
  • IF model crosses autonomous AI R&D threshold → THEN deploy AI control protocols + external audit
  • IF model demonstrates situational awareness during evaluation → THEN halt and review

Why If-Then > Fixed Rules

Three structural advantages over traditional fixed regulations:

Adapts to Capability, Not Calendar

Fixed rules (“safety measures X required from January 2026”) become irrelevant as capabilities advance unpredictably. If-then commitments trigger when capability actually emerges — appropriate response to a system whose capabilities surprise even its developers (the unexpected-capabilities problem).

Avoids Premature Constraint

Fixed rules either over-constrain (slowing beneficial work) or under-constrain (failing dangerous work). If-then proportions response to actual demonstrated risk.

Clear Pause Conditions

Defining “what would force a pause” before the moment of decision avoids race-dynamic pressure to keep going. Pre-commitment = commitment device against future temptation.

Where If-Then Commitments Appear

Corporate Level: Frontier Safety Frameworks

FSFs are essentially structured if-then commitments:

  • Anthropic’s RSP — the canonical instance
  • OpenAI’s Preparedness Framework
  • DeepMind’s Frontier Safety Framework
  • 9 others as of March 2025

National Level

EU AI Act includes capability-conditional provisions (training run thresholds at 10²⁵ FLOPs trigger risk assessments) — partial if-then logic.

International Level

AI Red Lines (ai-red-lines) — internationally agreed prohibitions on specific dangerous capabilities, with sanctions if crossed. Effectively if-then at the international level.

The Operational Backbone

If-then commitments require infrastructure to function:

  • Capability evaluations (capability-evaluations) — to detect threshold crossings
  • KRI/KCI structure (ai-risk-management) — Key Risk Indicators paired with Key Control Indicators
  • Pause protocols — operational procedures for actually halting work
  • Verification mechanisms — how outsiders confirm thresholds + commitments
  • Accountability — what happens when commitments aren’t honored

Without these, if-then commitments are aspirational rather than operational.

Limitations

Threshold Definition Difficulty

What counts as “uplift to bioweapon synthesis”? Capability evaluations are imperfect; thresholds get debated.

Sandbagging Concerns

Models with situational-awareness could deliberately underperform on evaluations to avoid triggering thresholds. The SR2025 sandbagging-evals agenda specifically tests for this.

Verification

Without privileged access, how does the public verify thresholds were properly evaluated and commitments honored?

Race-Dynamic Pressure

When competitors don’t honor similar commitments, defection becomes individually rational. International coordination is required to make if-then commitments stable across all major actors.

Connection to Wiki

If-then commitments are the bridge between strategy (Ch.3) and governance (Ch.4):

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.