Towards Alignment Auditing as a Numbers-Go-Up Science
Sam Marks — 2025-08-04 — Anthropic — LessWrong
Summary
Proposes that alignment auditing should become a ‘numbers-go-up’ field organized around the progress metric of auditing agent performance on standard testbeds, similar to how other ML fields use concrete metrics like perplexity or throughput.
Source
- Link: https://lesswrong.com/posts/bGYQgBPEyHidnZCdE/towards-alignment-auditing-as-a-numbers-go-up-science
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- other-evals — Evals