If-Then Commitments
If-then commitments are a governance pattern where specific safety measures are pre-committed to be triggered when AI systems reach predefined capability thresholds. Rather than fixed rules that age poorly as capabilities advance, if-then commitments adapt to actual demonstrated capability — making them well-suited to the adaptive governance paradigm.
The AI Safety Atlas introduces the term in Ch.3.6 (socio-technical strategies) and elaborates in Ch.4.
The Pattern
IF [model crosses capability threshold X]
THEN [additional safety measures Y must be in place]
ELSE [development pauses until measures are achievable]
Examples:
- IF model can uplift novice bioweapon synthesis → THEN deploy WSL3+ weight security
- IF model crosses autonomous AI R&D threshold → THEN deploy AI control protocols + external audit
- IF model demonstrates situational awareness during evaluation → THEN halt and review
Why If-Then > Fixed Rules
Three structural advantages over traditional fixed regulations:
Adapts to Capability, Not Calendar
Fixed rules (“safety measures X required from January 2026”) become irrelevant as capabilities advance unpredictably. If-then commitments trigger when capability actually emerges — appropriate response to a system whose capabilities surprise even its developers (the unexpected-capabilities problem).
Avoids Premature Constraint
Fixed rules either over-constrain (slowing beneficial work) or under-constrain (failing dangerous work). If-then proportions response to actual demonstrated risk.
Clear Pause Conditions
Defining “what would force a pause” before the moment of decision avoids race-dynamic pressure to keep going. Pre-commitment = commitment device against future temptation.
Where If-Then Commitments Appear
Corporate Level: Frontier Safety Frameworks
FSFs are essentially structured if-then commitments:
- Anthropic’s RSP — the canonical instance
- OpenAI’s Preparedness Framework
- DeepMind’s Frontier Safety Framework
- 9 others as of March 2025
National Level
EU AI Act includes capability-conditional provisions (training run thresholds at 10²⁵ FLOPs trigger risk assessments) — partial if-then logic.
International Level
AI Red Lines (ai-red-lines) — internationally agreed prohibitions on specific dangerous capabilities, with sanctions if crossed. Effectively if-then at the international level.
The Operational Backbone
If-then commitments require infrastructure to function:
- Capability evaluations (capability-evaluations) — to detect threshold crossings
- KRI/KCI structure (ai-risk-management) — Key Risk Indicators paired with Key Control Indicators
- Pause protocols — operational procedures for actually halting work
- Verification mechanisms — how outsiders confirm thresholds + commitments
- Accountability — what happens when commitments aren’t honored
Without these, if-then commitments are aspirational rather than operational.
Limitations
Threshold Definition Difficulty
What counts as “uplift to bioweapon synthesis”? Capability evaluations are imperfect; thresholds get debated.
Sandbagging Concerns
Models with situational-awareness could deliberately underperform on evaluations to avoid triggering thresholds. The SR2025 sandbagging-evals agenda specifically tests for this.
Verification
Without privileged access, how does the public verify thresholds were properly evaluated and commitments honored?
Race-Dynamic Pressure
When competitors don’t honor similar commitments, defection becomes individually rational. International coordination is required to make if-then commitments stable across all major actors.
Connection to Wiki
If-then commitments are the bridge between strategy (Ch.3) and governance (Ch.4):
- frontier-safety-frameworks — corporate implementation
- responsible-scaling-policy — canonical example
- ai-risk-management — operational structure (KRIs/KCIs)
- capability-evaluations — the evaluation backbone
- ai-red-lines — international-level analog
- ai-governance — adaptive governance frame
- risk-amplifiers — race dynamics undermine voluntary if-then
- atlas-ch4-governance-03-systemic-challenges, atlas-ch3-strategies-06-socio-technical-strategies — primary sources
Related Pages
- frontier-safety-frameworks
- responsible-scaling-policy
- ai-risk-management
- capability-evaluations
- ai-red-lines
- ai-governance
- ai-safety-culture
- risk-amplifiers
- sandbagging-evals
- ai-safety-atlas-textbook
- atlas-ch4-governance-03-systemic-challenges
- atlas-ch3-strategies-06-socio-technical-strategies
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- AI Safety Atlas Ch.3 — Socio-Technical Strategies — referenced as
[[atlas-ch3-strategies-06-socio-technical-strategies]] - AI Safety Atlas Ch.4 — Governance Problems — referenced as
[[atlas-ch4-governance-01-governance-problems]] - AI Safety Atlas Ch.4 — Systemic Challenges — referenced as
[[atlas-ch4-governance-03-systemic-challenges]]