RedDebate: Safer Responses through Multi-Agent Red Teaming Debates

Ali Asad, Stephen Obadinma, Radin Shayanfar, Xiaodan Zhu — 2025-06-04 — arXiv

Summary

Introduces RedDebate, a fully automated multi-agent debate framework that uses collaborative argumentation among LLMs with memory modules to identify and mitigate unsafe behaviors through red-teaming, demonstrating substantial reductions in unsafe outputs on safety benchmarks.

Key Result

RedDebate substantially reduces unsafe outputs across diverse models, with memory integration yielding further significant safety improvements beyond debate alone.

Source

Link: https://arxiv.org/abs/2506.11083
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- various-redteams — Evals

various-redteams

AI Safety Compendium

Explorer

RedDebate: Safer Responses through Multi-Agent Red Teaming Debates

RedDebate: Safer Responses through Multi-Agent Red Teaming Debates

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

RedDebate: Safer Responses through Multi-Agent Red Teaming Debates

RedDebate: Safer Responses through Multi-Agent Red Teaming Debates

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents