DeepMind
DeepMind is google’s AI research laboratory, widely recognized as one of the world’s leading AI research organizations. Originally founded in 2010 in London and acquired by Google in 2014, it has produced landmark results in reinforcement learning, protein structure prediction (AlphaFold), and large language models (Gemini). Within the AI safety ecosystem, DeepMind is one of the three major frontier AI labs alongside openai and anthropic.
Structure: research laboratory subsidiary of a for-profit (Alphabet/Google).
Safety Teams (per Shallow Review 2025)
DeepMind’s safety org is unusually granular and team-driven:
- Amplified oversight — scalable supervision of capable models
- Interpretability — mechanistic understanding of internal model behavior
- ASAT eng — automated alignment research engineering
- Causal Incentives Working Group
- Frontier Safety Risk Assessment — evals, threat models, the framework
- Mitigations — banning accounts, refusal training, jailbreak robustness
- Loss of Control — control research, alignment evals
(Detailed team structure diagram maintained externally.)
Risk Management Framework
DeepMind operates under the Frontier Safety Framework, which ties safety requirements to capability levels — sharing the structural pattern of anthropic’s responsible-scaling-policy and openai’s Preparedness Framework, with lab-specific thresholds and protocols.
Public Alignment Agenda
An Approach to Technical AGI Safety and Security (April 2025) is DeepMind’s most recent public statement of its overall safety approach.
Key People
Rohin Shah, Allan Dafoe, Anca Dragan, Alex Irpan, Alex Turner, Anna Wang, Arthur Conmy, David Lindner, Jonah Brown-Cohen, Lewis Ho, Neel Nanda, Raluca Ada Popa, Rishub Jain, Rory Greig, Sebastian Farquhar, Senthooran Rajamanoharan, Sophie Bridgers, Tobi Ijitoye, Tom Everitt, Victoria Krakovna, Vikrant Varma, Zac Kenton, Four Flynn, Jonathan Richens, Lewis Smith, Janos Kramar, Matthew Rahtz, Mary Phuong, Erik Jenner.
Funding
Google. Explicit 2024 DeepMind spending was approximately £1.3B (per UK Companies House filings), excluding most Gemini compute and adjacent costs.
Notable 2025 Outputs (selection from SR2025)
- A Pragmatic Vision for Interpretability
- How Can Interpretability Researchers Help AGI Go Well?
- Evaluating Frontier Models for Stealth and Situational Awareness
- When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors
- MONA: Managed Myopia with Approval Feedback
- Consistency Training Helps Stop Sycophancy and Jailbreaks
- Negative Results for SAEs On Downstream Tasks (GDM Mech Interp Update #2)
- Difficulties with Evaluating a Deception Detector for AIs
- Taking a responsible path to AGI
- Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance
- A Pragmatic Way to Measure Chain-of-Thought Monitorability
The full list of DeepMind-attributed papers from SR2025 is reachable via summary frontmatter agendas: [...] containing google-deepmind or via the wiki’s search tool.
Critiques
External critiques aggregated in SR2025: Stein-Perlman, Carlsmith on labs in general, underelicitation in eval reports, On Google’s Safety Plan.
Role in the AI Safety Landscape
- Scale of research — As part of google, DeepMind has access to enormous compute resources and a large research staff, making it one of the most capable AI research organizations in the world.
- Breadth of approach — DeepMind pursues both fundamental AI research and applied safety work, contributing to areas including interpretability, robustness, and evaluation methodologies.
- Industry coordination — DeepMind’s Frontier Safety Framework, alongside openai’s Preparedness Framework and anthropic’s RSP, represents an emerging norm that frontier labs should have explicit safety policies tied to capability levels.
Related Pages
- ai-safety
- ai-alignment
- responsible-scaling-policy
- capability-evaluations
- openai
- anthropic
- holden-karnofsky
- nick-joseph
- interpretability
- robustness
- scalable-oversight
- ai-control
- 80k-podcast-holden-karnofsky-concrete-safety
- 80k-podcast-nick-joseph-anthropic-safety
- ai-safety
- future-of-humanity-institute
- google-deepmind
- metr
- rob-wiblin
- 80k-podcast-olsson-ziegler-ml-engineering
- summary-bostrom-ai-policy
- laura-weidinger