AI Safety Atlas Ch.4 — Governance Problems

Source: Governance Problems

AI governance differs fundamentally from traditional technology regulation because AI is simultaneously a general-purpose technology, an information processor, and potentially an intelligent system — creating challenges without precedent. See governance-problems.

Why Traditional Governance Fails

Conventional approaches (pharmaceutical clinical trials, nuclear treaties, facility monitoring) assume:

  • Predictable development paths
  • Clear, narrow applications
  • Controllable physical infrastructure

AI breaks all three:

  • General-purpose nature — one system reshapes healthcare, finance, transportation, education simultaneously; sector-specific regulation is insufficient
  • Information capabilities — generates and manipulates content at scales conventional frameworks weren’t designed for
  • Intelligence dimension — sufficiently capable systems may evade controls or pursue unintended objectives, a dynamic without precedent

Three Fundamental Problems

1. Unexpected Capabilities

Foundation models exhibit emergent abilities appearing suddenly with scale. GPT-3 unexpectedly performed arithmetic; later models showed unanticipated reasoning. Current evaluations cannot “guarantee absence of unknown threats, forecast new emergent abilities, or assess risks from autonomous systems.” Testing best practices remain nascent.

2. Deployment Safety

Once released, AI gets repurposed beyond intended uses. Same dialogue model → misinformation → cyberattacks. Jailbreaks bypass safety measures. Autonomous AI agents amplify these risks by chaining capabilities over extended periods, making post-deployment behavior increasingly difficult to predict.

3. Proliferation

“AI models are patterns of numbers instantly copied and transmitted globally.” Model weights spread via cyberattacks, insider leaks, rapid replication. Open-source ChatGPT-clones strip safety features and reveal dangerous capabilities. Model distillation extracts capabilities without weight access. Containment is fundamentally impossible — unlike nuclear materials.

Effective Governance Targets

Successful intervention points need three properties:

  • Measurability — compute used in training measurable in FLOPs (clear thresholds, compliance monitoring)
  • Controllability — practical mechanisms must exist (semiconductor supply-chain chokepoints enable export controls)
  • Meaningfulness — must address fundamental capability/risk drivers (regulating UIs ≠ preventing emergent capability)

Key intervention points span the development pipeline:

  • Early: compute infrastructure, training data
  • During: safety frameworks, monitoring systems
  • Deployment: monitoring controls, post-deployment measures

Connection to Wiki

This subchapter provides the analytical foundation for the rest of Ch.4. The “three target properties” (measurable + controllable + meaningful) is the criterion for evaluating any governance intervention. It also frames why the chapter prioritizes:

It deepens existing wiki pages: