Automating AI Safety: What we can do today
Matthew Shinkle, Eyon Jang, Jacques Thibodeau — 2025-07-25 — SPAR, PIBBSS — LessWrong
Summary
Proposes concrete infrastructure projects to improve AI coding agents’ ability to execute technical AI safety experiments, including compiled monofiles, indexable documentation, iteratively refined package guides, structured sandbox environments, and focused benchmarks for evaluating research automation capabilities.
Source
- Link: https://lesswrong.com/posts/FqpAPC48CzAtvfx5C/automating-ai-safety-what-we-can-do-today
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- black-box-make-ai-solve-it — Black-box safety (understand and control current model behaviour) / Iterative alignment