Video and transcript of talk on automating alignment research

Joe Carlsmith — 2025-04-30 — Anthropic — LessWrong

Summary

Technical talk proposing a framework for safely automating alignment research, distinguishing empirical research (easier to evaluate via experiments) from conceptual research (harder to evaluate), and arguing empirical automation can bootstrap conceptual automation.

Source