AI Safety Levels (ASL)

AI Safety Levels (ASL) are Anthropic’s framework for standardized capability tiers demanding increasingly rigorous safety measures. The structure is the operational backbone of Anthropic’s RSP and inspired by biosafety levels used for infectious disease research — increasingly dangerous pathogens require increasingly stringent containment.

The Levels

LevelDescription
ASL-1Systems posing no meaningful catastrophic risk
ASL-2Early dangerous capability signs (e.g., bioweapon assembly info, but not exceeding what search engines provide)
ASL-3Substantially increases catastrophic misuse risk vs. non-AI baselines OR shows low-level autonomous capabilities
ASL-4Not yet defined — qualitative escalation in misuse potential and autonomy
ASL-5+Not yet defined — distance from current systems precludes specification

The undefined upper levels are intentional: ASL-4/5 will involve qualitative jumps, not just quantitative escalations of ASL-3.

How ASLs Trigger Action

Each level corresponds to required safeguards:

  • ASL-1 — basic safety measures
  • ASL-2 — capability-evaluation requirements + standard security
  • ASL-3 — comprehensive security and deployment restrictions; weight-protection scaling

When models cross ASL thresholds, additional safeguards must be in place — the if-then-commitments pattern operationalized.

Required Evaluation Categories

ASLs require multiple evaluation categories working together:

  • Capability evaluations — detect dangerous abilities (autonomous replication, CBRN, cyberattack)
  • Safety evaluations — verify control measures remain effective

Both must function collaboratively. Passing capability evaluations is insufficient if safety evaluations indicate concerns.

Comparison with Other Frameworks

The ASL approach has parallels in:

  • OpenAI Preparedness Framework — uses risk categories (cybersecurity, persuasion, autonomous replication) with low-to-critical spectrums rather than fixed levels
  • DeepMind Frontier Safety Framework — uses Critical Capability Levels (CCLs) per domain (bio, cyber, autonomy) rather than unified levels
FrameworkStructureThreshold-trigger pattern
Anthropic ASLUnified levels (ASL-1 → 5+)Cross level → required safeguards
OpenAI PFPer-category risk (low → critical)Pre-/post-mitigation evaluation
DeepMind FSFPer-domain CCLsDomain-specific mitigation combinations

All three are governance variants of the same underlying pattern: gated scaling based on evaluation results.

The Biosafety Analogy

The biosafety analogy is pedagogically useful:

  • BSL-1 — minimal risk, basic procedures
  • BSL-2 — moderate risk, restricted access
  • BSL-3 — serious disease, biocontainment lab
  • BSL-4 — life-threatening, maximum biocontainment

The pattern: standardized risk classification + standardized containment requirements + clear escalation triggers. ASL applies the same structure to AI capabilities.

The analogy has limits. AI capabilities are harder to characterize than pathogens (which have stable biological properties). AI moves up the ladder unpredictably while pathogens stay where they are.

Connection to Wiki

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.