AI Risk Management
AI risk management is the operational framework for maintaining (risks − mitigations) below acceptable tolerance within an AI development organization. The AI Safety Atlas (Ch.3.6) treats this as one of the five socio-technical-strategies — operationalizing the technical safety work into organizational decision-making.
The framework draws from established risk-management practices in safety-critical industries (nuclear, aviation, pharmaceuticals) and adapts them for AI. SaferAI provides comparative analysis of frontier AI labs against these dimensions.
Four Components
1. Identification
Classify risks across known categories:
- Cybersecurity — model misuse for cyber attacks
- CBRN — chemical, biological, radiological, nuclear weapons assistance
- Manipulation — persuasion, deepfakes, disinformation
- Autonomous replication — see autonomous-replication
Plus:
- Red teaming to discover unknown risks (the various-redteams SR2025 agenda)
- Risk models mapping capability pathways to real-world harms
2. Analysis
Quantify what’s tolerable and where the lines are:
- Quantitative risk-tolerance thresholds (organizational, not just technical)
- Key Risk Indicators (KRIs) = capability thresholds — what capability levels trigger heightened response
- Key Control Indicators (KCIs) = mitigation targets — what protective measures must be in place at each capability level
- Pause commitments: organizations commit to halting development if required controls cannot be achieved
This is the structural backbone of Anthropic’s RSP — RSP is essentially a public KRI/KCI commitment.
3. Treatment
Implement protective measures:
- Containment — control access (the internal access controls)
- Deployment measures — prevent misuse (API gating, rate limiting, monitoring)
- Assurance processes — produce safety evidence for stakeholders
Continuous monitoring of both:
- KRIs (detecting when dangerous capabilities emerge)
- KCIs (verifying mitigations remain effective)
4. Governance
Three-lines-of-defense model from financial-industry risk management:
- First line — operational managers owning daily decisions
- Second line — specialized risk teams advising and challenging decisions
- Third line — independent audit verifying that the system functions
This separation matters: first-line teams have skin in the game (they want their projects to ship); second-line provides expertise and pushback; third-line validates the whole system.
Why Three Lines
The Atlas’s implicit argument: any single line fails under pressure. First-line alone has bias toward shipping. Second-line alone can be marginalized or co-opted. Third-line alone arrives too late. Three lines compose into a system more robust than any component.
What Adoption Looks Like in Practice
A frontier AI lab with serious risk management would:
- Publish KRI/KCI frameworks (like RSPs)
- Have a Chief Risk Officer reporting independently to the board
- Have safety teams with veto authority over deployment, not just advisory roles
- Run periodic external audits with public summaries
- Maintain detailed incident logs and post-mortems
- Demonstrate willingness to delay shipping for safety reasons
The Atlas’s implicit critique: most labs partially implement this. SaferAI’s comparative analysis quantifies the gap.
Connection to AI-Specific Challenges
Standard risk management assumes:
- Risks can be quantified with reasonable confidence
- Mitigations have measurable effectiveness
- Models for harm pathways exist
AI safety challenges all three:
- Pre-paradigmatic field — risk quantification is contested
- Mitigation effectiveness is hard to verify (can’t test alignment empirically before deployment)
- Harm pathways for emergent capabilities are not fully known
This is why AI risk management is not just standard risk management with AI labels — it requires explicit acknowledgment of irreducible uncertainty, plus pause commitments for cases where uncertainty is too high.
Connection to Wiki
- socio-technical-strategies — parent category
- ai-safety-culture — cultural counterpart (culture enables risk management; risk management makes culture operational)
- responsible-scaling-policy — canonical KRI/KCI implementation
- capability-evaluations — produces the KRI evidence
- various-redteams — supports identification phase
- risk-amplifiers — race dynamics undermine pause commitments
- atlas-ch3-strategies-06-socio-technical-strategies — primary source
Related Pages
- socio-technical-strategies
- ai-safety-culture
- responsible-scaling-policy
- capability-evaluations
- various-redteams
- risk-amplifiers
- ai-governance
- autonomous-replication
- ai-safety-atlas-textbook
- atlas-ch3-strategies-06-socio-technical-strategies
- atlas-ch3-strategies-07-combining-strategies
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- AI Safety Atlas Ch.3 — Challenges — referenced as
[[atlas-ch3-strategies-02-challenges]] - AI Safety Atlas Ch.3 — Combining Strategies — referenced as
[[atlas-ch3-strategies-07-combining-strategies]] - AI Safety Atlas Ch.3 — Socio-Technical Strategies — referenced as
[[atlas-ch3-strategies-06-socio-technical-strategies]]