What We Learned Trying to Diff Base and Chat Models (And Why It Matters)
Clément Dumas, Julian Minder, Neel Nanda — 2025-06-30 — MATS Program
Source
- Link: https://www.lesswrong.com/posts/xmpauEXEerzYcJKNm/what-we-learned-trying-to-diff-base-and-chat-models-and-why
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- model-diffing — White-box safety (i.e. Interpretability)