Circuits in Superposition
Lucius Bushnaq, jake_mendel — 2024-10-14 — Apollo Research
Source
- Link: https://www.lesswrong.com/posts/roE7SHjFWEoMcGZKd/circuits-in-superposition-compressing-many-small-neural
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- reverse-engineering — White-box safety (i.e. Interpretability)