MoSSAIC: AI Safety After Mechanism
Matt Farr, Aditya Arpitha Prasad, Chris Pang, Aditya Adiga, Jayson Amati, Sahil K — 2025-07-01 — ODYSSEY 2025 Conference
Summary
Critiques the causal-mechanistic paradigm in AI safety (particularly mechanistic interpretability), argues it will fail as intelligence scales, and proposes MoSSAIC (Management of Substrate-Sensitive AI Capabilities) as a supplementary framework to address limitations when dealing with evasive intelligence.
Source
- Link: https://openreview.net/forum?id=n7WYSJ35FU
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- high-actuation-spaces — Theory