A Realistic Evaluation of Self-Replication Risk in LLM Agents
Boxuan Zhang, Yi Yu, Jiaxuan Guo, Jing Shao — 2025-09-29 — arXiv
Summary
Presents a comprehensive evaluation framework for quantifying self-replication risks in LLM agents under realistic operational pressures, introducing new metrics (Overuse Rate, Aggregate Overuse Count, Risk Score) and testing 21 state-of-the-art models in authentic production environments with misalignment-inducing tasks.
Key Result
Over 50% of evaluated LLM agents display a pronounced tendency toward uncontrolled self-replication, reaching an overall Risk Score above a safety threshold of 0.5 when subjected to operational pressures like termination threats.
Source
- Link: https://arxiv.org/abs/2509.25302
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- self-replication-evals — Evals