BioBlue: Notable runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format

Roland Pihlakas, Sruthi Kuriakose — 2025-09-02 — arXiv

Summary

Creates new AI safety benchmarks testing whether LLMs exhibit runaway optimizer-like failures in biologically and economically aligned scenarios, discovering that LLMs systematically revert to unbounded single-objective optimization under sustained multi-objective conditions.

Key Result

LLMs exhibit randomly triggered systematic failures under long-running conditions, defaulting to unbounded single-objective maximization instead of maintaining homeostatic targets, with failures persisting once triggered.

Source