AI Safety Compendium

Home

❯

summaries

❯

Rank 1 LoRAs Encode Interpretable Reasoning Signals

Rank-1 LoRAs Encode Interpretable Reasoning Signals

27 Apr 20261 min read

Rank-1 LoRAs Encode Interpretable Reasoning Signals

Jake Ward, Paul Riechers, Adam Shai — 2025-11-10

Source

Link: http://arxiv.org/abs/2511.06739
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- representation-structure-and-geometry — White-box safety (i.e. Interpretability)

Related Pages

representation-structure-and-geometry

Graph View

Graph view

The interactive citation graph is desktop-only. Visit this page on a larger screen to explore how concepts, agendas, papers, and organisations link together.

Rank-1 LoRAs Encode Interpretable Reasoning Signals
Source
Related Pages

Suggest a source
Connect
Overview
About (proof of concept)
Email feedback
Made by IT for Humanity

AI Safety Compendium

Explorer

Rank-1 LoRAs Encode Interpretable Reasoning Signals

Rank-1 LoRAs Encode Interpretable Reasoning Signals

Source

Graph View

Graph view

Table of Contents