Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods

Markov Grey, Charbel-Raphaël Segerie — 2025-05-08 — arXiv

Summary

A systematic literature review consolidating the field of AI safety evaluations, proposing a taxonomy around three dimensions: what properties are measured (capabilities, propensities, control), how they are measured (behavioral and internal techniques), and how measurements integrate into governance frameworks.

Source

Link: https://arxiv.org/abs/2505.05541
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-evals — Evals

capability-evals

AI Safety Compendium

Explorer

Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods

Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods

Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents