A Pragmatic Way to Measure Chain-of-Thought Monitorability

Scott Emmons, Roland S. Zimmermann, David K. Elson, Rohin Shah — 2025-10-28 — Google DeepMind — arXiv

Summary

Proposes metrics to measure two components of Chain-of-Thought monitorability (legibility and coverage) using an LLM-based autorater, validates the approach with synthetic degradations, and applies it to frontier models on challenging benchmarks.

Key Result

Frontier models exhibit high default monitorability on challenging benchmarks when measured using the proposed legibility and coverage metrics.

Source