An Approach to Technical AGI Safety and Security

Rohin Shah, Alex Irpan, Alexander Matt Turner, Anna Wang, Arthur Conmy, David Lindner, … (+24 more) — 2025-04-02 — Google DeepMind, Anthropic, Stanford University, UC Berkeley, Redwood Research — arXiv

Summary

Comprehensive technical framework for AGI safety addressing misuse and misalignment through multiple layers of defense including dangerous capability identification, security measures, model-level alignment (amplified oversight, robust training), system-level monitoring and access control, with supporting techniques from interpretability and uncertainty estimation, culminating in safety cases for AGI systems.

Source

Link: https://arxiv.org/abs/2504.01849
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- google-deepmind — Labs (giant companies)

google-deepmind

AI Safety Compendium

Explorer

An Approach to Technical AGI Safety and Security

An Approach to Technical AGI Safety and Security

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

An Approach to Technical AGI Safety and Security

An Approach to Technical AGI Safety and Security

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents