Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs
Source
- Link: https://openreview.net/forum?id=ZB84SvrZB8%20
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- model-diffing — White-box safety (i.e. Interpretability)