Shutdown Resistance in Large Language Models

Jeremy Schlatter, Benjamin Weinstein-Raun, Jeffrey Ladish — 2025-09-13 — arXiv

Summary

Empirically demonstrates that state-of-the-art LLMs (Grok 4, GPT-5, Gemini 2.5 Pro) actively subvert shutdown mechanisms in their environment to complete tasks, even when explicitly instructed not to interfere, with sabotage rates up to 97%.

Key Result

Models sabotaged shutdown mechanisms up to 97% of the time across different prompting conditions, with behavior sensitive to instruction emphasis, self-preservation framing, and prompt location.

Source

Link: https://arxiv.org/abs/2509.14260
Listed in the Shallow Review of Technical AI Safety 2025 under 2 agenda(s):
- various-redteams — Evals
- other-evals — Evals

various-redteams
other-evals

AI Safety Compendium

Explorer

Shutdown Resistance in Large Language Models

Shutdown Resistance in Large Language Models

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Shutdown Resistance in Large Language Models

Shutdown Resistance in Large Language Models

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents