LLM Robustness Leaderboard v1 —Technical report
Pierre Peigné - Lefebvre, Quentin Feuillade-Montixi, Tom David, Nicolas Miailhe — 2025-08-13 — PRISM Eval — arXiv
Summary
Introduces PRISM Eval BET, an automated red-teaming tool using Dynamic Adversarial Optimization that achieves 100% attack success rate against 37 of 41 state-of-the-art LLMs, along with fine-grained robustness metrics and primitive-level vulnerability analysis.
Key Result
Automated red-teaming system achieved 100% attack success rate against 37 of 41 tested LLMs, with attack difficulty varying by over 300-fold across models despite universal vulnerability.
Source
- Link: https://arxiv.org/abs/2508.06296
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- various-redteams — Evals
- Editorial blurb (verbatim):
[LLM Robustness Leaderboard v1 \--Technical report](https://arxiv.org/abs/2508.06296)