An Approach to Technical AGI Safety and Security
Rohin Shah, Alex Irpan, Alexander Matt Turner, Anna Wang, Arthur Conmy, David Lindner, … (+24 more) — 2025-04-02 — Google DeepMind, Anthropic, Stanford University, UC Berkeley, Redwood Research — arXiv
Summary
Comprehensive technical framework for AGI safety addressing misuse and misalignment through multiple layers of defense including dangerous capability identification, security measures, model-level alignment (amplified oversight, robust training), system-level monitoring and access control, with supporting techniques from interpretability and uncertainty estimation, culminating in safety cases for AGI systems.
Source
- Link: https://arxiv.org/abs/2504.01849
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- google-deepmind — Labs (giant companies)