Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

Roland Pihlakas, Sruthi Susan Kuriakose, Shruti Datta Gupta — 2025-03-16 — LessWrong

Summary

Introduces BioBlue benchmark suite testing LLMs on long-running scenarios inspired by biological and economic principles (homeostasis, sustainability, multi-objective balancing), discovering systematic failure modes where models default to unbounded single-objective maximization despite initial success.

Key Result

LLMs systematically fail at multi-objective homeostasis and diminishing returns tasks by defaulting to unbounded maximization of a single objective while neglecting others, with failures emerging after periods of initially successful behavior.

Source