AI Safety Levels (ASL)
AI Safety Levels (ASL) are Anthropic’s framework for standardized capability tiers demanding increasingly rigorous safety measures. The structure is the operational backbone of Anthropic’s RSP and inspired by biosafety levels used for infectious disease research — increasingly dangerous pathogens require increasingly stringent containment.
The Levels
| Level | Description |
|---|---|
| ASL-1 | Systems posing no meaningful catastrophic risk |
| ASL-2 | Early dangerous capability signs (e.g., bioweapon assembly info, but not exceeding what search engines provide) |
| ASL-3 | Substantially increases catastrophic misuse risk vs. non-AI baselines OR shows low-level autonomous capabilities |
| ASL-4 | Not yet defined — qualitative escalation in misuse potential and autonomy |
| ASL-5+ | Not yet defined — distance from current systems precludes specification |
The undefined upper levels are intentional: ASL-4/5 will involve qualitative jumps, not just quantitative escalations of ASL-3.
How ASLs Trigger Action
Each level corresponds to required safeguards:
- ASL-1 — basic safety measures
- ASL-2 — capability-evaluation requirements + standard security
- ASL-3 — comprehensive security and deployment restrictions; weight-protection scaling
When models cross ASL thresholds, additional safeguards must be in place — the if-then-commitments pattern operationalized.
Required Evaluation Categories
ASLs require multiple evaluation categories working together:
- Capability evaluations — detect dangerous abilities (autonomous replication, CBRN, cyberattack)
- Safety evaluations — verify control measures remain effective
Both must function collaboratively. Passing capability evaluations is insufficient if safety evaluations indicate concerns.
Comparison with Other Frameworks
The ASL approach has parallels in:
- OpenAI Preparedness Framework — uses risk categories (cybersecurity, persuasion, autonomous replication) with low-to-critical spectrums rather than fixed levels
- DeepMind Frontier Safety Framework — uses Critical Capability Levels (CCLs) per domain (bio, cyber, autonomy) rather than unified levels
| Framework | Structure | Threshold-trigger pattern |
|---|---|---|
| Anthropic ASL | Unified levels (ASL-1 → 5+) | Cross level → required safeguards |
| OpenAI PF | Per-category risk (low → critical) | Pre-/post-mitigation evaluation |
| DeepMind FSF | Per-domain CCLs | Domain-specific mitigation combinations |
All three are governance variants of the same underlying pattern: gated scaling based on evaluation results.
The Biosafety Analogy
The biosafety analogy is pedagogically useful:
- BSL-1 — minimal risk, basic procedures
- BSL-2 — moderate risk, restricted access
- BSL-3 — serious disease, biocontainment lab
- BSL-4 — life-threatening, maximum biocontainment
The pattern: standardized risk classification + standardized containment requirements + clear escalation triggers. ASL applies the same structure to AI capabilities.
The analogy has limits. AI capabilities are harder to characterize than pathogens (which have stable biological properties). AI moves up the ladder unpredictably while pathogens stay where they are.
Connection to Wiki
- responsible-scaling-policy — RSP is the document operationalizing ASL
- evaluation-frameworks — ASL is one governance framework
- frontier-safety-frameworks — the multi-company cluster ASL belongs to
- if-then-commitments — the operational pattern
- capability-evaluations — the evidence base for ASL classification
- anthropic — the org behind ASL
Related Pages
- responsible-scaling-policy
- evaluation-frameworks
- frontier-safety-frameworks
- if-then-commitments
- capability-evaluations
- anthropic
- ai-safety-atlas-textbook
- atlas-ch5-evaluations-07-evaluation-frameworks
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- AI Safety Atlas Ch.5 — Evaluation Frameworks — referenced as
[[atlas-ch5-evaluations-07-evaluation-frameworks]]