AI Safety Compendium

❯

❯

Cross Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs

Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs

27 Apr 20261 min read

Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs

Source

Link: https://openreview.net/forum?id=ZB84SvrZB8%20
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- model-diffing — White-box safety (i.e. Interpretability)

Related Pages

model-diffing

Graph View

Graph view

The interactive citation graph is desktop-only. Visit this page on a larger screen to explore how concepts, agendas, papers, and organisations link together.

Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs
Source
Related Pages

Created with Quartz v0.1.0 © 2026

Suggest a source
Connect
Overview
About (proof of concept)
Email feedback
Made by IT for Humanity