Rank-1 LoRAs Encode Interpretable Reasoning Signals
Jake Ward, Paul Riechers, Adam Shai — 2025-11-10
Source
- Link: http://arxiv.org/abs/2511.06739
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- representation-structure-and-geometry — White-box safety (i.e. Interpretability)