Six Thoughts on AI Safety
Boaz Barak — 2025-01-24 — Harvard University — LessWrong
Summary
Position paper presenting six non-consensus views on AI safety: safety won’t be solved by default or by AI scientists alone, alignment should focus on robust compliance with specifications rather than abstract values, detection/monitoring is more important than prevention, interpretability isn’t necessary for alignment, and humanity can survive unaligned superintelligence if aligned ASI dominates compute resources.
Source
- Link: https://lesswrong.com/posts/3jnziqCF3vA2NXAKp/six-thoughts-on-ai-safety
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- model-specs-and-constitutions — Black-box safety (understand and control current model behaviour) / Model psychology