High-Actuation Spaces — SR2025 Agenda Snapshot
One-sentence summary: Mech interp and alignment assume a stable “computational substrate” (linear algebra on GPUs). If later AI uses different substrates (e.g. something neuromorphic), methods like probes and steering will not transfer. Therefore, better to try and infer goals via a “telic DAG” which abstracts over substrates, and so sidestep the issue of how to define intermediate representations. Category theory is intended to provide guarantees that this abstraction is valid.
Theory of Change
Sufficiently complex mindlike entities can alter their goals in ways that cannot be predicted or accounted for under substrate-dependent descriptions of the kind sought in mechanistic interpretability. use the telic DAG to define a method analogous to factoring a causal DAG.
Broad Approach
maths / philosophy
Target Case
pessimistic
Key People
Sahil K, Matt Farr, Aditya Arpitha Prasad, Chris Pang, Aditya Adiga, Jayson Amati, Steve Petersen, Topos, T J
Funding
Estimated FTEs: 1-10
See Also
Live theory, MoSSAIC, Topos Institute, agent-foundations
Outputs in 2025
7 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: high-actuation-spaces (these were generated alongside this file from the same export).
Source
- Row in
shallow-review-2025/agendas.csv(name = High-Actuation Spaces) — Shallow Review of Technical AI Safety 2025.
Related Pages
- ai-safety
- ai-safety
- agent-foundations
- asymptotic-guarantees
- behavior-alignment-theory
- heuristic-explanations
- natural-abstractions
- other-corrigibility
- the-learning-theoretic-agenda
- tiling-agents
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- Summary: AI Safety (Wikipedia) — referenced as
[[ai-safety]]