High-Actuation Spaces — SR2025 Agenda Snapshot

One-sentence summary: Mech interp and alignment assume a stable “computational substrate” (linear algebra on GPUs). If later AI uses different substrates (e.g. something neuromorphic), methods like probes and steering will not transfer. Therefore, better to try and infer goals via a “telic DAG” which abstracts over substrates, and so sidestep the issue of how to define intermediate representations. Category theory is intended to provide guarantees that this abstraction is valid.

Theory of Change

Sufficiently complex mindlike entities can alter their goals in ways that cannot be predicted or accounted for under substrate-dependent descriptions of the kind sought in mechanistic interpretability. use the telic DAG to define a method analogous to factoring a causal DAG.

Broad Approach

maths / philosophy

Target Case

pessimistic

Key People

Sahil K, Matt Farr, Aditya Arpitha Prasad, Chris Pang, Aditya Adiga, Jayson Amati, Steve Petersen, Topos, T J

Funding

Estimated FTEs: 1-10

See Also

Live theory, MoSSAIC, Topos Institute, agent-foundations

Outputs in 2025

7 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: high-actuation-spaces (these were generated alongside this file from the same export).

Source

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.