LLM Robustness Leaderboard v1 —Technical report

Pierre Peigné - Lefebvre, Quentin Feuillade-Montixi, Tom David, Nicolas Miailhe — 2025-08-13 — PRISM Eval — arXiv

Summary

Introduces PRISM Eval BET, an automated red-teaming tool using Dynamic Adversarial Optimization that achieves 100% attack success rate against 37 of 41 state-of-the-art LLMs, along with fine-grained robustness metrics and primitive-level vulnerability analysis.

Key Result

Automated red-teaming system achieved 100% attack success rate against 37 of 41 tested LLMs, with attack difficulty varying by over 300-fold across models despite universal vulnerability.

Source

Link: https://arxiv.org/abs/2508.06296
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- various-redteams — Evals
Editorial blurb (verbatim): [LLM Robustness Leaderboard v1 \--Technical report](https://arxiv.org/abs/2508.06296)

various-redteams

AI Safety Compendium

Explorer

LLM Robustness Leaderboard v1 --Technical report

LLM Robustness Leaderboard v1 —Technical report

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

LLM Robustness Leaderboard v1 --Technical report

LLM Robustness Leaderboard v1 —Technical report

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents