Towards Alignment Auditing as a Numbers-Go-Up Science

Sam Marks — 2025-08-04 — Anthropic — LessWrong

Summary

Proposes that alignment auditing should become a ‘numbers-go-up’ field organized around the progress metric of auditing agent performance on standard testbeds, similar to how other ML fields use concrete metrics like perplexity or throughput.

Source