If-Then Commitments

If-then commitments are a governance pattern where specific safety measures are pre-committed to be triggered when AI systems reach predefined capability thresholds. Rather than fixed rules that age poorly as capabilities advance, if-then commitments adapt to actual demonstrated capability — making them well-suited to the adaptive governance paradigm.

The AI Safety Atlas introduces the term in Ch.3.6 (socio-technical strategies) and elaborates in Ch.4.

The Pattern

IF [model crosses capability threshold X]
THEN [additional safety measures Y must be in place]
ELSE [development pauses until measures are achievable]

Examples:

IF model can uplift novice bioweapon synthesis → THEN deploy WSL3+ weight security
IF model crosses autonomous AI R&D threshold → THEN deploy AI control protocols + external audit
IF model demonstrates situational awareness during evaluation → THEN halt and review

Why If-Then > Fixed Rules

Three structural advantages over traditional fixed regulations:

Adapts to Capability, Not Calendar

Fixed rules (“safety measures X required from January 2026”) become irrelevant as capabilities advance unpredictably. If-then commitments trigger when capability actually emerges — appropriate response to a system whose capabilities surprise even its developers (the unexpected-capabilities problem).

Avoids Premature Constraint

Fixed rules either over-constrain (slowing beneficial work) or under-constrain (failing dangerous work). If-then proportions response to actual demonstrated risk.

Clear Pause Conditions

Defining “what would force a pause” before the moment of decision avoids race-dynamic pressure to keep going. Pre-commitment = commitment device against future temptation.

Where If-Then Commitments Appear

Corporate Level: Frontier Safety Frameworks

FSFs are essentially structured if-then commitments:

Anthropic’s RSP — the canonical instance
OpenAI’s Preparedness Framework
DeepMind’s Frontier Safety Framework
9 others as of March 2025

National Level

EU AI Act includes capability-conditional provisions (training run thresholds at 10²⁵ FLOPs trigger risk assessments) — partial if-then logic.

International Level

AI Red Lines (ai-red-lines) — internationally agreed prohibitions on specific dangerous capabilities, with sanctions if crossed. Effectively if-then at the international level.

The Operational Backbone

If-then commitments require infrastructure to function:

Capability evaluations (capability-evaluations) — to detect threshold crossings
KRI/KCI structure (ai-risk-management) — Key Risk Indicators paired with Key Control Indicators
Pause protocols — operational procedures for actually halting work
Verification mechanisms — how outsiders confirm thresholds + commitments
Accountability — what happens when commitments aren’t honored

Without these, if-then commitments are aspirational rather than operational.

Limitations

Threshold Definition Difficulty

What counts as “uplift to bioweapon synthesis”? Capability evaluations are imperfect; thresholds get debated.

Sandbagging Concerns

Models with situational-awareness could deliberately underperform on evaluations to avoid triggering thresholds. The SR2025 sandbagging-evals agenda specifically tests for this.

Verification

Without privileged access, how does the public verify thresholds were properly evaluated and commitments honored?

Race-Dynamic Pressure

When competitors don’t honor similar commitments, defection becomes individually rational. International coordination is required to make if-then commitments stable across all major actors.

Connection to Wiki

If-then commitments are the bridge between strategy (Ch.3) and governance (Ch.4):

frontier-safety-frameworks — corporate implementation
responsible-scaling-policy — canonical example
ai-risk-management — operational structure (KRIs/KCIs)
capability-evaluations — the evaluation backbone
ai-red-lines — international-level analog
ai-governance — adaptive governance frame
risk-amplifiers — race dynamics undermine voluntary if-then
atlas-ch4-governance-03-systemic-challenges, atlas-ch3-strategies-06-socio-technical-strategies — primary sources

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.

AI Safety Atlas Ch.3 — Socio-Technical Strategies — referenced as [[atlas-ch3-strategies-06-socio-technical-strategies]]
AI Safety Atlas Ch.4 — Governance Problems — referenced as [[atlas-ch4-governance-01-governance-problems]]
AI Safety Atlas Ch.4 — Systemic Challenges — referenced as [[atlas-ch4-governance-03-systemic-challenges]]

AI Safety Compendium

Explorer

If-Then Commitments

If-Then Commitments

The Pattern

Why If-Then > Fixed Rules

Adapts to Capability, Not Calendar

Avoids Premature Constraint

Clear Pause Conditions

Where If-Then Commitments Appear

Corporate Level: Frontier Safety Frameworks

National Level

International Level

The Operational Backbone

Limitations

Threshold Definition Difficulty

Sandbagging Concerns

Verification

Race-Dynamic Pressure

Connection to Wiki

Sources cited

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

If-Then Commitments

If-Then Commitments

The Pattern

Why If-Then > Fixed Rules

Adapts to Capability, Not Calendar

Avoids Premature Constraint

Clear Pause Conditions

Where If-Then Commitments Appear

Corporate Level: Frontier Safety Frameworks

National Level

International Level

The Operational Backbone

Limitations

Threshold Definition Difficulty

Sandbagging Concerns

Verification

Race-Dynamic Pressure

Connection to Wiki

Related Pages

Sources cited

Graph View

Graph view

Table of Contents

Backlinks