AI Safety Compendium

Home

❯

summaries

❯

What We Learned Trying to Diff Base and Chat Models (And Why It Matters)

27 Apr 20261 min read

What We Learned Trying to Diff Base and Chat Models (And Why It Matters)

Clément Dumas, Julian Minder, Neel Nanda — 2025-06-30 — MATS Program

Source

Link: https://www.lesswrong.com/posts/xmpauEXEerzYcJKNm/what-we-learned-trying-to-diff-base-and-chat-models-and-why
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- model-diffing — White-box safety (i.e. Interpretability)

Related Pages

model-diffing

Graph View

Graph view

The interactive citation graph is desktop-only. Visit this page on a larger screen to explore how concepts, agendas, papers, and organisations link together.

What We Learned Trying to Diff Base and Chat Models (And Why It Matters)
Source
Related Pages

Suggest a source
Connect
Overview
About (proof of concept)
Email feedback
Made by IT for Humanity

AI Safety Compendium

Explorer

What We Learned Trying to Diff Base and Chat Models (And Why It Matters)

What We Learned Trying to Diff Base and Chat Models (And Why It Matters)

Source

Graph View

Graph view

Table of Contents