Evaluating Frontier Models for Stealth and Situational Awareness
Mary Phuong, Roland S. Zimmermann, Ziyue Wang, David Lindner, Victoria Krakovna, Sarah Cogan, … (+3 more) — 2025-05-02 — Google DeepMind — arXiv
Summary
Presents a suite of 16 evaluations measuring prerequisites for AI scheming behavior: 5 evaluations of stealth (ability to circumvent oversight) and 11 evaluations of situational awareness (instrumental reasoning about self and environment), demonstrating how these can inform scheming inability safety cases.
Key Result
Current frontier models show no concerning levels of either situational awareness or stealth capabilities required for successful scheming.
Source
- Link: https://arxiv.org/abs/2505.01420
- Listed in the Shallow Review of Technical AI Safety 2025 under 2 agenda(s):
- google-deepmind — Labs (giant companies)
- situational-awareness-and-self-awareness-evals — Evals