We should try to automate AI safety work asap
Marius Hobbhahn — 2025-04-26 — Apollo Research — LessWrong
Summary
Proposes a framework and concrete roadmaps for automating AI safety research across multiple domains (evals, red-teaming, monitoring, interpretability), distinguishing between pipeline automation of human-designed processes and research automation of ideation itself.
Source
- Link: https://lesswrong.com/posts/W3KfxjbqBAnifBQoi/we-should-try-to-automate-ai-safety-work-asap
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-evals — Evals