AI Safety Atlas Ch.4 — Implementation
Source: Implementation
How governance actually gets operationalized: through AI safety standards, regulatory visibility, and compliance enforcement.
AI Safety Standards
Three distinct national models:
- EU — mandates Codes of Practice for General-Purpose AI models
- US — voluntary NIST AI Risk Management Framework
- China — Standardization Administration coordinates 100+ technical/ethical specifications via centralized agencies, “highly centralized and closely linked to its broader geopolitical ambitions”
Standards build safety culture through four mechanisms:
- Establishing rules/expectations within domestic ecosystems
- Embedding researchers in accountability networks
- Internalizing safety routines through implementation
- Reinforcing safety considerations when embedded in products
Regulatory Visibility
The ASPIRE Framework
Six criteria for effective external scrutiny:
- Access to systems and information
- Searching attitude toward vulnerabilities
- Proportionality to actual risks
- Independence from developers
- Resources (adequate)
- Expertise (necessary)
Model Registries
“Centralized databases that include architectural details, training procedures, performance metrics, and societal impact assessments.” Documentation typically: identification, technical specs, performance benchmarks, impact assessments, deployment plans.
Know Your Customer (KYC) for Compute
Adapted from financial-services KYC, applied to compute access. Targets capability thresholds → preventative rather than reactive. Frontier models concentrate in hyperscale providers — those providers serve as natural regulatory chokepoints. Global compute distribution creates jurisdictional fragmentation.
Incident Reporting
Fragmented across jurisdictions:
- EU AI Act — mandates “serious incidents” reporting
- China — building centralized critical-failure infrastructure
- US — sector-specific only
Ensuring Compliance
Licensing Regimes
Mirror pharmaceutical and nuclear models:
- Formal approval before deployment
- Periodic audits
- License revocation capabilities
Developers must submit safety cases — “formal argument[s] supported by evidence showing that a system meets safety thresholds for deployment.” Includes threat modeling, red-teaming results, monitoring plans.
Enforcement
- EU AI Office — investigates violations, fines up to 3% of global turnover
- China Cyberspace Administration — centralized enforcement under vertical frameworks; lacks transparent procedural safeguards
- US — fragmented across agencies, no national licensing authority
Six Fundamental Limitations
- Technical Understanding Gaps — RLHF and current techniques may fail catastrophically with more capable systems; frameworks built on potentially obsolete approaches
- Measurement Challenges — robust metrics for deception tendency, autonomous-improvement resistance, etc. remain unavailable; “compliance becomes interpretation rather than verification”
- Expertise Shortages — individuals understanding both advanced AI and governance are critically limited; talent concentrated in dominant firms
- Coordination Friction — each additional stakeholder/requirement adds friction; excessive bureaucracy drives development toward less responsible actors
- Speed Mismatches — “AI advances monthly while international agreements require years of negotiation”
- Regulatory Arbitrage — strict European requirements may relocate development to permissive jurisdictions; models trained anywhere deploy everywhere
Connection to Wiki
This is the operational layer: how governance theories become practice. Connections:
- eu-ai-act — concrete instance with 3% turnover fines
- ai-safety-institute — the institutional embodiment of regulatory visibility
- ai-safety-culture, ai-risk-management — operational pillars of standards
- capability-evaluations — the evaluation backbone of safety cases
- responsible-scaling-policy — the FSF most aligned with this implementation pattern
- atlas-ch3-strategies-06-socio-technical-strategies — the strategy → implementation bridge