SHADE-Arena: Evaluating sabotage and monitoring in LLM agents

Xiang Deng, Chen Bo Calvin Zhang, Tyler Tracy, Buck Shlegeris, Yuqi Sun, Paul Colognese, … (+3 more) — 2025-06-16 — Anthropic, Scale AI, Redwood Research — Anthropic Research Blog

Summary

Introduces SHADE-Arena, a comprehensive evaluation suite with 17 complex tasks testing whether AI agents can successfully perform covert sabotage while evading detection by monitoring systems, with empirical results across 14 different language models.

Key Result

No models exceeded 30% overall success rate at completing both benign and malicious tasks while evading detection, though strongest models achieved nearly 60% evasion success when they completed the side task.

Source