SLT for AI Safety
Jesse Hoogland — 2025-07-01 — Timaeus — LessWrong
Summary
Introduces a research agenda for applying Singular Learning Theory to AI safety, proposing that understanding loss landscape geometry enables both interpretability (reading learned algorithms) and alignment (controlling what algorithms are learned through training data interventions).
Source
- Link: https://lesswrong.com/posts/J7CyENFYXPxXQpsnD/slt-for-ai-safety
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- learning-dynamics-and-developmental-interpretability — White-box safety (i.e. Interpretability)