AI Risk Management

AI risk management is the operational framework for maintaining (risks − mitigations) below acceptable tolerance within an AI development organization. The AI Safety Atlas (Ch.3.6) treats this as one of the five socio-technical-strategies — operationalizing the technical safety work into organizational decision-making.

The framework draws from established risk-management practices in safety-critical industries (nuclear, aviation, pharmaceuticals) and adapts them for AI. SaferAI provides comparative analysis of frontier AI labs against these dimensions.

Four Components

1. Identification

Classify risks across known categories:

  • Cybersecurity — model misuse for cyber attacks
  • CBRN — chemical, biological, radiological, nuclear weapons assistance
  • Manipulation — persuasion, deepfakes, disinformation
  • Autonomous replication — see autonomous-replication

Plus:

  • Red teaming to discover unknown risks (the various-redteams SR2025 agenda)
  • Risk models mapping capability pathways to real-world harms

2. Analysis

Quantify what’s tolerable and where the lines are:

  • Quantitative risk-tolerance thresholds (organizational, not just technical)
  • Key Risk Indicators (KRIs) = capability thresholds — what capability levels trigger heightened response
  • Key Control Indicators (KCIs) = mitigation targets — what protective measures must be in place at each capability level
  • Pause commitments: organizations commit to halting development if required controls cannot be achieved

This is the structural backbone of Anthropic’s RSP — RSP is essentially a public KRI/KCI commitment.

3. Treatment

Implement protective measures:

  • Containment — control access (the internal access controls)
  • Deployment measures — prevent misuse (API gating, rate limiting, monitoring)
  • Assurance processes — produce safety evidence for stakeholders

Continuous monitoring of both:

  • KRIs (detecting when dangerous capabilities emerge)
  • KCIs (verifying mitigations remain effective)

4. Governance

Three-lines-of-defense model from financial-industry risk management:

  • First line — operational managers owning daily decisions
  • Second line — specialized risk teams advising and challenging decisions
  • Third line — independent audit verifying that the system functions

This separation matters: first-line teams have skin in the game (they want their projects to ship); second-line provides expertise and pushback; third-line validates the whole system.

Why Three Lines

The Atlas’s implicit argument: any single line fails under pressure. First-line alone has bias toward shipping. Second-line alone can be marginalized or co-opted. Third-line alone arrives too late. Three lines compose into a system more robust than any component.

What Adoption Looks Like in Practice

A frontier AI lab with serious risk management would:

  • Publish KRI/KCI frameworks (like RSPs)
  • Have a Chief Risk Officer reporting independently to the board
  • Have safety teams with veto authority over deployment, not just advisory roles
  • Run periodic external audits with public summaries
  • Maintain detailed incident logs and post-mortems
  • Demonstrate willingness to delay shipping for safety reasons

The Atlas’s implicit critique: most labs partially implement this. SaferAI’s comparative analysis quantifies the gap.

Connection to AI-Specific Challenges

Standard risk management assumes:

  • Risks can be quantified with reasonable confidence
  • Mitigations have measurable effectiveness
  • Models for harm pathways exist

AI safety challenges all three:

  • Pre-paradigmatic field — risk quantification is contested
  • Mitigation effectiveness is hard to verify (can’t test alignment empirically before deployment)
  • Harm pathways for emergent capabilities are not fully known

This is why AI risk management is not just standard risk management with AI labels — it requires explicit acknowledgment of irreducible uncertainty, plus pause commitments for cases where uncertainty is too high.

Connection to Wiki

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.