Video and transcript of talk on automating alignment research

Joe Carlsmith — 2025-04-30 — Anthropic — LessWrong

Summary

Technical talk proposing a framework for safely automating alignment research, distinguishing empirical research (easier to evaluate via experiments) from conceptual research (harder to evaluate), and arguing empirical automation can bootstrap conceptual automation.

Source

Link: https://lesswrong.com/posts/TQbptN7F4ijPnQRLy/video-and-transcript-of-talk-on-automating-alignment
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- supervising-ais-improving-ais — Make AI solve it

supervising-ais-improving-ais

AI Safety Compendium

Explorer

Video and transcript of talk on automating alignment research

Video and transcript of talk on automating alignment research

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Video and transcript of talk on automating alignment research

Video and transcript of talk on automating alignment research

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents