Six Thoughts on AI Safety

Boaz Barak — 2025-01-24 — Harvard University — LessWrong

Summary

Position paper presenting six non-consensus views on AI safety: safety won’t be solved by default or by AI scientists alone, alignment should focus on robust compliance with specifications rather than abstract values, detection/monitoring is more important than prevention, interpretability isn’t necessary for alignment, and humanity can survive unaligned superintelligence if aligned ASI dominates compute resources.

Source

Link: https://lesswrong.com/posts/3jnziqCF3vA2NXAKp/six-thoughts-on-ai-safety
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- model-specs-and-constitutions — Black-box safety (understand and control current model behaviour) / Model psychology

model-specs-and-constitutions

AI Safety Compendium

Explorer

Six Thoughts on AI Safety

Six Thoughts on AI Safety

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Six Thoughts on AI Safety

Six Thoughts on AI Safety

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents