Video and transcript of talk on automating alignment research
Joe Carlsmith — 2025-04-30 — Anthropic — LessWrong
Summary
Technical talk proposing a framework for safely automating alignment research, distinguishing empirical research (easier to evaluate via experiments) from conceptual research (harder to evaluate), and arguing empirical automation can bootstrap conceptual automation.
Source
- Link: https://lesswrong.com/posts/TQbptN7F4ijPnQRLy/video-and-transcript-of-talk-on-automating-alignment
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- supervising-ais-improving-ais — Make AI solve it