Frontier Safety Frameworks

Frontier Safety Frameworks (FSFs) are publicly-published corporate AI safety policies that define capability thresholds, security protocols, and pause commitments for advanced AI development. The AI Safety Atlas (Ch.4.4) reports that as of March 2025, twelve major companies have published comprehensive FSFs.

Major Companies with FSFs (March 2025)

Twelve companies, including:

Plus several others. The list is growing as governance pressure mounts.

Common Framework Elements

Across FSFs, recurring components:

Capability Thresholds Triggering Enhanced Safeguards

Specific capabilities that, when reached, require additional safety measures:

  • Biological weapons assistance — uplift threshold for novice/expert
  • Cyber offensive capabilities — autonomous exploitation, vulnerability discovery
  • Automated AI research — self-improvement, ML R&D acceleration

These thresholds operationalize if-then commitments at the corporate level.

Model Weight Security

Security protocols scaling with capability advancement:

  • Lower thresholds → standard security
  • Higher thresholds → state-actor-resistant security (corresponding to SSL frameworks)

Pause Conditions

Explicit conditions for pausing development if thresholds are crossed without corresponding controls.

Evaluation Schedules

  • Pre-deployment evaluation
  • During-training evaluation
  • Post-deployment monitoring

Accountability Mechanisms

  • Whistleblower protections
  • Board-level reporting
  • Public disclosure requirements

How FSFs Operationalize Strategy

FSFs are the corporate-level implementation of:

Limitations

Voluntary

FSFs are self-imposed. Without external enforcement, they can be:

  • Modified or abandoned under competitive pressure
  • Weakened through redefinition of thresholds
  • Treated as marketing rather than operational constraint (the safety washing concern)

Inconsistent

Different companies use different threshold definitions, evaluation methodologies, security tiers. Comparison across FSFs is difficult.

Single-Company Scope

Even fully-honored, an FSF only governs one company. Systemic risks from competitive dynamics aren’t addressed.

Verification

External parties cannot easily verify whether internal commitments are honored without privileged access to the company’s evaluations and decisions.

Why FSFs Matter Despite Limitations

The Atlas’s argument: FSFs are imperfect but valuable as:

  • Bridge to regulation — governments often codify what leading companies are already doing
  • Industry coordination point — a public common framework reduces race-to-the-bottom
  • Internal accountability — published commitments create pressure on internal decision-making
  • Demonstration — proves operational feasibility of safety measures

The Brussels Effect applies in reverse here: publicly-committed corporate practices can become regulatory floors.

Connection to Wiki

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.