SLT for AI Safety

Jesse Hoogland — 2025-07-01 — Timaeus — LessWrong

Summary

Introduces a research agenda for applying Singular Learning Theory to AI safety, proposing that understanding loss landscape geometry enables both interpretability (reading learned algorithms) and alignment (controlling what algorithms are learned through training data interventions).

Source