AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement

J Rosser, Jakob Foerster — 2025-02-02 — arXiv

Summary

Introduces AgentBreeder, a framework for multi-objective evolutionary search over multi-agent LLM scaffolds, demonstrating that scaffolds can be optimized for safety (79.4% uplift in blue mode) while also revealing adversarially weak scaffolds that emerge during capability optimization (red mode).

Key Result

Blue mode optimization achieved 79.4% average uplift in safety benchmark performance while maintaining capabilities; red mode discovered adversarially weak scaffolds emerging alongside capability improvements.

Source