Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)
Roland Pihlakas, Sruthi Susan Kuriakose, Shruti Datta Gupta — 2025-03-16 — LessWrong
Summary
Introduces BioBlue benchmark suite testing LLMs on long-running scenarios inspired by biological and economic principles (homeostasis, sustainability, multi-objective balancing), discovering systematic failure modes where models default to unbounded single-objective maximization despite initial success.
Key Result
LLMs systematically fail at multi-objective homeostasis and diminishing returns tasks by defaulting to unbounded maximization of a single objective while neglecting others, with failures emerging after periods of initially successful behavior.
Source
- Link: https://lesswrong.com/posts/PejNckwQj3A2MGhMA/systematic-runaway-optimiser-like-llm-failure-modes-on
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- other-evals — Evals