AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons
Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, … (+96 more) — 2025-04-18 — MLCommons — arXiv
Summary
Introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI system risk and reliability across 12 hazard categories including violent crimes, weapons, CSAM, and specialized advice, using extensive prompt datasets and a novel entropy-based evaluation framework with a five-tier grading scale.
Source
- Link: https://arxiv.org/abs/2503.05731
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-evals — Evals