AI Safety Atlas Ch.4 — Implementation

Source: Implementation

How governance actually gets operationalized: through AI safety standards, regulatory visibility, and compliance enforcement.

AI Safety Standards

Three distinct national models:

  • EU — mandates Codes of Practice for General-Purpose AI models
  • US — voluntary NIST AI Risk Management Framework
  • China — Standardization Administration coordinates 100+ technical/ethical specifications via centralized agencies, “highly centralized and closely linked to its broader geopolitical ambitions”

Standards build safety culture through four mechanisms:

  • Establishing rules/expectations within domestic ecosystems
  • Embedding researchers in accountability networks
  • Internalizing safety routines through implementation
  • Reinforcing safety considerations when embedded in products

Regulatory Visibility

The ASPIRE Framework

Six criteria for effective external scrutiny:

  • Access to systems and information
  • Searching attitude toward vulnerabilities
  • Proportionality to actual risks
  • Independence from developers
  • Resources (adequate)
  • Expertise (necessary)

Model Registries

“Centralized databases that include architectural details, training procedures, performance metrics, and societal impact assessments.” Documentation typically: identification, technical specs, performance benchmarks, impact assessments, deployment plans.

Know Your Customer (KYC) for Compute

Adapted from financial-services KYC, applied to compute access. Targets capability thresholds → preventative rather than reactive. Frontier models concentrate in hyperscale providers — those providers serve as natural regulatory chokepoints. Global compute distribution creates jurisdictional fragmentation.

Incident Reporting

Fragmented across jurisdictions:

  • EU AI Act — mandates “serious incidents” reporting
  • China — building centralized critical-failure infrastructure
  • US — sector-specific only

Ensuring Compliance

Licensing Regimes

Mirror pharmaceutical and nuclear models:

  • Formal approval before deployment
  • Periodic audits
  • License revocation capabilities

Developers must submit safety cases“formal argument[s] supported by evidence showing that a system meets safety thresholds for deployment.” Includes threat modeling, red-teaming results, monitoring plans.

Enforcement

  • EU AI Office — investigates violations, fines up to 3% of global turnover
  • China Cyberspace Administration — centralized enforcement under vertical frameworks; lacks transparent procedural safeguards
  • US — fragmented across agencies, no national licensing authority

Six Fundamental Limitations

  1. Technical Understanding Gaps — RLHF and current techniques may fail catastrophically with more capable systems; frameworks built on potentially obsolete approaches
  2. Measurement Challenges — robust metrics for deception tendency, autonomous-improvement resistance, etc. remain unavailable; “compliance becomes interpretation rather than verification”
  3. Expertise Shortages — individuals understanding both advanced AI and governance are critically limited; talent concentrated in dominant firms
  4. Coordination Friction — each additional stakeholder/requirement adds friction; excessive bureaucracy drives development toward less responsible actors
  5. Speed Mismatches“AI advances monthly while international agreements require years of negotiation”
  6. Regulatory Arbitrage — strict European requirements may relocate development to permissive jurisdictions; models trained anywhere deploy everywhere

Connection to Wiki

This is the operational layer: how governance theories become practice. Connections: