The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee, Lihao Sun, Chris Wendler, Fernanda Viégas, Martin Wattenberg — 2025-04-19 — Google Research
Source
- Link: https://arxiv.org/pdf/2504.14379
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- representation-structure-and-geometry — White-box safety (i.e. Interpretability)