Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

Roland Pihlakas, Sruthi Susan Kuriakose, Shruti Datta Gupta — 2025-03-16 — LessWrong

Summary

Introduces BioBlue benchmark suite testing LLMs on long-running scenarios inspired by biological and economic principles (homeostasis, sustainability, multi-objective balancing), discovering systematic failure modes where models default to unbounded single-objective maximization despite initial success.

Key Result

LLMs systematically fail at multi-objective homeostasis and diminishing returns tasks by defaulting to unbounded maximization of a single objective while neglecting others, with failures emerging after periods of initially successful behavior.

Source

Link: https://lesswrong.com/posts/PejNckwQj3A2MGhMA/systematic-runaway-optimiser-like-llm-failure-modes-on
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- other-evals — Evals

other-evals

AI Safety Compendium

Explorer

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents