AI Safety Compendium

Home

❯

summaries

❯

Open Source Replication of Anthropic's Crosscoder paper for model diffing

Open Source Replication of Anthropic's Crosscoder paper for model-diffing

27 Apr 20261 min read

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing

Connor Kissane, robertzk, Arthur Conmy, Neel Nanda — 2024-10-27

Source

Link: https://www.lesswrong.com/posts/srt6JXsRMtmqAJavD/open-source-replication-of-anthropic-s-crosscoder-paper-for
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- model-diffing — White-box safety (i.e. Interpretability)

Related Pages

model-diffing

Graph View

Graph view

The interactive citation graph is desktop-only. Visit this page on a larger screen to explore how concepts, agendas, papers, and organisations link together.

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Source
Related Pages

Suggest a source
Connect
Overview
About (proof of concept)
Email feedback
Made by IT for Humanity