BioBlue: Notable runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format
Roland Pihlakas, Sruthi Kuriakose — 2025-09-02 — arXiv
Summary
Creates new AI safety benchmarks testing whether LLMs exhibit runaway optimizer-like failures in biologically and economically aligned scenarios, discovering that LLMs systematically revert to unbounded single-objective optimization under sustained multi-objective conditions.
Key Result
LLMs exhibit randomly triggered systematic failures under long-running conditions, defaulting to unbounded single-objective maximization instead of maintaining homeostatic targets, with failures persisting once triggered.
Source
- Link: https://arxiv.org/abs/2509.02655
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- mild-optimisation — Black-box safety (understand and control current model behaviour) / Goal robustness