Frontier Safety Frameworks

Frontier Safety Frameworks (FSFs) are publicly-published corporate AI safety policies that define capability thresholds, security protocols, and pause commitments for advanced AI development. The AI Safety Atlas (Ch.4.4) reports that as of March 2025, twelve major companies have published comprehensive FSFs.

Major Companies with FSFs (March 2025)

Twelve companies, including:

Anthropic — Responsible Scaling Policy (the canonical FSF)
OpenAI — Preparedness Framework
Google DeepMind — Frontier Safety Framework
Meta — Frontier AI Framework
Microsoft

Plus several others. The list is growing as governance pressure mounts.

Common Framework Elements

Across FSFs, recurring components:

Capability Thresholds Triggering Enhanced Safeguards

Specific capabilities that, when reached, require additional safety measures:

Biological weapons assistance — uplift threshold for novice/expert
Cyber offensive capabilities — autonomous exploitation, vulnerability discovery
Automated AI research — self-improvement, ML R&D acceleration

These thresholds operationalize if-then commitments at the corporate level.

Model Weight Security

Security protocols scaling with capability advancement:

Lower thresholds → standard security
Higher thresholds → state-actor-resistant security (corresponding to SSL frameworks)

Pause Conditions

Explicit conditions for pausing development if thresholds are crossed without corresponding controls.

Evaluation Schedules

Pre-deployment evaluation
During-training evaluation
Post-deployment monitoring

Accountability Mechanisms

Whistleblower protections
Board-level reporting
Public disclosure requirements

How FSFs Operationalize Strategy

FSFs are the corporate-level implementation of:

KCI risk management — capability thresholds = KRIs; required safeguards = KCIs
if-then-commitments — capability-conditional commitments
ai-safety-culture — embedding safety in operational decisions
capability-evaluations — the empirical engine that determines whether thresholds have been crossed

Limitations

Voluntary

FSFs are self-imposed. Without external enforcement, they can be:

Modified or abandoned under competitive pressure
Weakened through redefinition of thresholds
Treated as marketing rather than operational constraint (the safety washing concern)

Inconsistent

Different companies use different threshold definitions, evaluation methodologies, security tiers. Comparison across FSFs is difficult.

Single-Company Scope

Even fully-honored, an FSF only governs one company. Systemic risks from competitive dynamics aren’t addressed.

Verification

External parties cannot easily verify whether internal commitments are honored without privileged access to the company’s evaluations and decisions.

Why FSFs Matter Despite Limitations

The Atlas’s argument: FSFs are imperfect but valuable as:

Bridge to regulation — governments often codify what leading companies are already doing
Industry coordination point — a public common framework reduces race-to-the-bottom
Internal accountability — published commitments create pressure on internal decision-making
Demonstration — proves operational feasibility of safety measures

The Brussels Effect applies in reverse here: publicly-committed corporate practices can become regulatory floors.

Connection to Wiki

responsible-scaling-policy — Anthropic’s FSF, the canonical instance
if-then-commitments — the operational logic
ai-risk-management — KRI/KCI framework
capability-evaluations — the evaluation backbone
ai-safety-culture — what FSFs partially codify
governance-architectures — corporate layer
risk-amplifiers — race dynamics undermine voluntary FSFs
atlas-ch4-governance-04-governance-architectures — primary source

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.

AI Safety Atlas Ch.3 — Misuse Prevention Strategies — referenced as [[atlas-ch3-strategies-03-misuse-prevention-strategies]]
AI Safety Atlas Ch.4 — Governance Architectures — referenced as [[atlas-ch4-governance-04-governance-architectures]]

AI Safety Compendium

Explorer

Frontier Safety Frameworks

Frontier Safety Frameworks

Major Companies with FSFs (March 2025)

Common Framework Elements

Capability Thresholds Triggering Enhanced Safeguards

Model Weight Security

Pause Conditions

Evaluation Schedules

Accountability Mechanisms

How FSFs Operationalize Strategy

Limitations

Voluntary

Inconsistent

Single-Company Scope

Verification

Why FSFs Matter Despite Limitations

Connection to Wiki

Sources cited

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Frontier Safety Frameworks

Frontier Safety Frameworks

Major Companies with FSFs (March 2025)

Common Framework Elements

Capability Thresholds Triggering Enhanced Safeguards

Model Weight Security

Pause Conditions

Evaluation Schedules

Accountability Mechanisms

How FSFs Operationalize Strategy

Limitations

Voluntary

Inconsistent

Single-Company Scope

Verification

Why FSFs Matter Despite Limitations

Connection to Wiki

Related Pages

Sources cited

Graph View

Graph view

Table of Contents

Backlinks