An Approach to Technical AGI Safety and Security

Rohin Shah, Alex Irpan, Alexander Matt Turner, Anna Wang, Arthur Conmy, David Lindner, … (+24 more) — 2025-04-02 — Google DeepMind, Anthropic, Stanford University, UC Berkeley, Redwood Research — arXiv

Summary

Comprehensive technical framework for AGI safety addressing misuse and misalignment through multiple layers of defense including dangerous capability identification, security measures, model-level alignment (amplified oversight, robust training), system-level monitoring and access control, with supporting techniques from interpretability and uncertainty estimation, culminating in safety cases for AGI systems.

Source