You Are What You Eat — AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
Simon Pepin Lehalleur, Jesse Hoogland, Matthew Farrugia-Roberts, Susan Wei, Alexander Gietelink Oldenziel, George Wang, … (+2 more) — 2025-02-08 — arXiv
Summary
Position paper arguing that understanding the relation between data distribution structure and trained model structure is central to AI alignment, and that developing statistical foundations for this understanding is necessary to progress beyond standard evaluation toward a robust mathematical science of alignment.
Source
- Link: https://arxiv.org/abs/2502.05475
- Listed in the Shallow Review of Technical AI Safety 2025 under 2 agenda(s):
- data-quality-for-alignment — Black-box safety (understand and control current model behaviour) / Better data
- data-attribution — White-box safety (i.e. Interpretability)
- Editorial blurb (verbatim):
[You Are What You Eat \-- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation](https://arxiv.org/abs/2502.05475)