AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, … (+96 more) — 2025-04-18 — MLCommons — arXiv

Summary

Introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI system risk and reliability across 12 hazard categories including violent crimes, weapons, CSAM, and specialized advice, using extensive prompt datasets and a novel entropy-based evaluation framework with a five-tier grading scale.

Source

Link: https://arxiv.org/abs/2503.05731
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-evals — Evals

capability-evals

AI Safety Compendium

Explorer

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents