Frontier Safety Frameworks
Frontier Safety Frameworks (FSFs) are publicly-published corporate AI safety policies that define capability thresholds, security protocols, and pause commitments for advanced AI development. The AI Safety Atlas (Ch.4.4) reports that as of March 2025, twelve major companies have published comprehensive FSFs.
Major Companies with FSFs (March 2025)
Twelve companies, including:
- Anthropic — Responsible Scaling Policy (the canonical FSF)
- OpenAI — Preparedness Framework
- Google DeepMind — Frontier Safety Framework
- Meta — Frontier AI Framework
- Microsoft
Plus several others. The list is growing as governance pressure mounts.
Common Framework Elements
Across FSFs, recurring components:
Capability Thresholds Triggering Enhanced Safeguards
Specific capabilities that, when reached, require additional safety measures:
- Biological weapons assistance — uplift threshold for novice/expert
- Cyber offensive capabilities — autonomous exploitation, vulnerability discovery
- Automated AI research — self-improvement, ML R&D acceleration
These thresholds operationalize if-then commitments at the corporate level.
Model Weight Security
Security protocols scaling with capability advancement:
- Lower thresholds → standard security
- Higher thresholds → state-actor-resistant security (corresponding to SSL frameworks)
Pause Conditions
Explicit conditions for pausing development if thresholds are crossed without corresponding controls.
Evaluation Schedules
- Pre-deployment evaluation
- During-training evaluation
- Post-deployment monitoring
Accountability Mechanisms
- Whistleblower protections
- Board-level reporting
- Public disclosure requirements
How FSFs Operationalize Strategy
FSFs are the corporate-level implementation of:
- KCI risk management — capability thresholds = KRIs; required safeguards = KCIs
- if-then-commitments — capability-conditional commitments
- ai-safety-culture — embedding safety in operational decisions
- capability-evaluations — the empirical engine that determines whether thresholds have been crossed
Limitations
Voluntary
FSFs are self-imposed. Without external enforcement, they can be:
- Modified or abandoned under competitive pressure
- Weakened through redefinition of thresholds
- Treated as marketing rather than operational constraint (the safety washing concern)
Inconsistent
Different companies use different threshold definitions, evaluation methodologies, security tiers. Comparison across FSFs is difficult.
Single-Company Scope
Even fully-honored, an FSF only governs one company. Systemic risks from competitive dynamics aren’t addressed.
Verification
External parties cannot easily verify whether internal commitments are honored without privileged access to the company’s evaluations and decisions.
Why FSFs Matter Despite Limitations
The Atlas’s argument: FSFs are imperfect but valuable as:
- Bridge to regulation — governments often codify what leading companies are already doing
- Industry coordination point — a public common framework reduces race-to-the-bottom
- Internal accountability — published commitments create pressure on internal decision-making
- Demonstration — proves operational feasibility of safety measures
The Brussels Effect applies in reverse here: publicly-committed corporate practices can become regulatory floors.
Connection to Wiki
- responsible-scaling-policy — Anthropic’s FSF, the canonical instance
- if-then-commitments — the operational logic
- ai-risk-management — KRI/KCI framework
- capability-evaluations — the evaluation backbone
- ai-safety-culture — what FSFs partially codify
- governance-architectures — corporate layer
- risk-amplifiers — race dynamics undermine voluntary FSFs
- atlas-ch4-governance-04-governance-architectures — primary source
Related Pages
- responsible-scaling-policy
- if-then-commitments
- ai-risk-management
- capability-evaluations
- ai-safety-culture
- governance-architectures
- risk-amplifiers
- anthropic
- openai
- deepmind
- ai-safety-atlas-textbook
- atlas-ch4-governance-04-governance-architectures
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- AI Safety Atlas Ch.3 — Misuse Prevention Strategies — referenced as
[[atlas-ch3-strategies-03-misuse-prevention-strategies]] - AI Safety Atlas Ch.4 — Governance Architectures — referenced as
[[atlas-ch4-governance-04-governance-architectures]]