A Pragmatic Way to Measure Chain-of-Thought Monitorability

Scott Emmons, Roland S. Zimmermann, David K. Elson, Rohin Shah — 2025-10-28 — Google DeepMind — arXiv

Summary

Proposes metrics to measure two components of Chain-of-Thought monitorability (legibility and coverage) using an LLM-based autorater, validates the approach with synthetic degradations, and applies it to frontier models on challenging benchmarks.

Key Result

Frontier models exhibit high default monitorability on challenging benchmarks when measured using the proposed legibility and coverage metrics.

Source

Link: https://arxiv.org/abs/2510.23966
Listed in the Shallow Review of Technical AI Safety 2025 under 2 agenda(s):
- google-deepmind — Labs (giant companies)
- chain-of-thought-monitoring — Black-box safety (understand and control current model behaviour) / Iterative alignment

google-deepmind
chain-of-thought-monitoring

AI Safety Compendium

Explorer

A Pragmatic Way to Measure Chain-of-Thought Monitorability

A Pragmatic Way to Measure Chain-of-Thought Monitorability

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

A Pragmatic Way to Measure Chain-of-Thought Monitorability

A Pragmatic Way to Measure Chain-of-Thought Monitorability

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents