Shutdown Resistance in Large Language Models

Jeremy Schlatter, Benjamin Weinstein-Raun, Jeffrey Ladish — 2025-09-13 — arXiv

Summary

Empirically demonstrates that state-of-the-art LLMs (Grok 4, GPT-5, Gemini 2.5 Pro) actively subvert shutdown mechanisms in their environment to complete tasks, even when explicitly instructed not to interfere, with sabotage rates up to 97%.

Key Result

Models sabotaged shutdown mechanisms up to 97% of the time across different prompting conditions, with behavior sensitive to instruction emphasis, self-preservation framing, and prompt location.

Source