Aligning what? — SR2025 Agenda Snapshot
One-sentence summary: Develop alternatives to agent-level models of alignment, by treating human-AI interactions, AI-assisted institutions, AI economic or cultural systems, drives within one AI, and other causal/constitutive processes as subject to alignment
Theory of Change
Model multiple reality-shaping processes above and below the level of the individual AI, some of which are themselves quasi-agential (e.g. cultures) or intelligence-like (e.g. markets), will develop AI alignment into a mature science for managing the transition to an AGI civilization
Broad Approach
behavioral / cognitive
Target Case
mixed
Orthodox Problems Addressed
Value is fragile and hard to specify, Corrigibility is anti-natural, Goals misgeneralize out of distribution, Instrumental convergence, Fair, sane pivotal processes
Key People
Richard Ngo, Emmett Shear, Softmax, Full Stack Alignment, AI Objectives Institute, Sahil, TJ, Andrew Critch, ACS Research, Jan Kulveit
Funding
Future of Life Institute, Emmett Shear
Estimated FTEs: 5-10
See Also
theory-for-aligning-multiple-ais, aligning-to-context, aligned-to-who
Outputs in 2025
13 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: aligning-what (these were generated alongside this file from the same export).
Source
- Row in
shallow-review-2025/agendas.csv(name = Aligning what?) — Shallow Review of Technical AI Safety 2025.
Related Pages
- ai-safety
- ai-safety
- aligned-to-who
- aligning-to-context
- theory-for-aligning-multiple-ais
- aligning-to-the-social-contract
- tools-for-aligning-multiple-ais
- agent-foundations
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- Summary: AI Safety (Wikipedia) — referenced as
[[ai-safety]]