Sparse Coding — SR2025 Agenda Snapshot

One-sentence summary: Decompose the polysemantic activations of the residual stream into a sparse linear combination of monosemantic “features” which correspond to interpretable concepts.

Theory of Change

Get a principled decomposition of an LLM’s activation into atomic components → identify deception and other misbehaviors.

Broad Approach

engineering / cognitive

Target Case

average

Orthodox Problems Addressed

Value is fragile and hard to specify, Goals misgeneralize out of distribution, Superintelligence can fool human supervisors

Key People

Leo Gao, Dan Mossing, Emmanuel Ameisen, Jack Lindsey, Adam Pearce, Thomas Heap, Abhinav Menon, Kenny Peng, Tim Lawson

Funding

everyone, roughly. Frontier labs, LTFF, Coefficient Giving, etc.

Estimated FTEs: 50-100

Critiques

Sparse Autoencoders Can Interpret Randomly Initialized Transformers, The Sparse Autoencoders bubble has popped, but they are still promising, Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research, Sparse Autoencoders Trained on the Same Data Learn Different Features, Why Not Just Train For Interpretability?

Outputs in 2025

44 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: sparse-coding (these were generated alongside this file from the same export).

Source

Row in shallow-review-2025/agendas.csv (name = Sparse Coding) — Shallow Review of Technical AI Safety 2025.

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.

Summary: AI Safety (Wikipedia) — referenced as [[ai-safety]]

AI Safety Compendium

Explorer

Sparse Coding

Sparse Coding — SR2025 Agenda Snapshot

Theory of Change

Broad Approach

Target Case

Orthodox Problems Addressed

Key People

Funding

Critiques

See Also

Outputs in 2025

Source

Sources cited

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Sparse Coding

Sparse Coding — SR2025 Agenda Snapshot

Theory of Change

Broad Approach

Target Case

Orthodox Problems Addressed

Key People

Funding

Critiques

See Also

Outputs in 2025

Source

Related Pages

Sources cited

Graph View

Graph view

Table of Contents

Backlinks