X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents

Salman Rahman, Liwei Jiang, James Shiffer, Genglin Liu, Sheriff Issaka, Md Rizwan Parvez, … (+4 more) — 2025-04-15 — University of Washington, UCLA, Microsoft — arXiv

Summary

Presents X-Teaming, a multi-agent framework for generating multi-turn jailbreaks that achieves up to 98.1% attack success rates on frontier models, and introduces XGuard-Train, an open-source dataset of 30K multi-turn jailbreaks for safety training.

Key Result

Achieved 96.2% jailbreak success rate against Claude 3.7 Sonnet and up to 98.1% across other leading models using adaptive multi-agent attack strategies.

Source