AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, … (+96 more) — 2025-04-18 — MLCommons — arXiv

Summary

Introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI system risk and reliability across 12 hazard categories including violent crimes, weapons, CSAM, and specialized advice, using extensive prompt datasets and a novel entropy-based evaluation framework with a five-tier grading scale.

Source