AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement
J Rosser, Jakob Foerster — 2025-02-02 — arXiv
Summary
Introduces AgentBreeder, a framework for multi-objective evolutionary search over multi-agent LLM scaffolds, demonstrating that scaffolds can be optimized for safety (79.4% uplift in blue mode) while also revealing adversarially weak scaffolds that emerge during capability optimization (red mode).
Key Result
Blue mode optimization achieved 79.4% average uplift in safety benchmark performance while maintaining capabilities; red mode discovered adversarially weak scaffolds emerging alongside capability improvements.
Source
- Link: https://arxiv.org/abs/2502.00757
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- tools-for-aligning-multiple-ais — Multi-agent first