Alignment to Whom
The “alignment to whom” question is the structural decomposition of AI alignment by principal-agent configuration: are we aligning one AI to one human, multiple AIs to one human, one AI to many humans, or many AIs to many humans? The AI Safety Atlas (Ch.3 long-term-questions appendix) treats these as four distinct alignment problems with different failure modes.
This is complementary to [[coherent-extrapolated-volition|“alignment to what”]]: alignment to whom determines the target principal; alignment to what determines the value content.
The Four Configurations
Single-Single Alignment
One AI ↔ one human.
Current approaches focus on intent alignment — interpreting intended meaning rather than literal commands. This is the foundational unsolved problem; everything else builds on it.
Key challenges:
- Human’s intent isn’t always clear, even to themselves
- Specification gaps between literal commands and intended outcomes
- Faithful interpretation of preferences vs. paternalism
Most current AI safety research operates at this level (see ai-alignment).
Single-Multi Alignment
Multiple AIs ↔ one human.
A coordinated AI system serving an individual. Includes:
- ASIs composed of cooperating smaller intelligences
- Personal AI assistants with multiple specialized sub-AIs
- AI agents coordinating to fulfill one principal’s goals
Critical caveat: still requires solving single-single first. Even cooperating AIs need to faithfully serve their human principal — single-single alignment failures multiply.
The Atlas’s structural concern: “Ideally, no individual or small group should control superintelligence.” Single-multi alignment, taken literally, enables unilateral control of advanced AI by individuals — not necessarily a desirable property.
Multi-Single Alignment
One AI ↔ many humans.
A single AI system serving many people. The challenge: aggregating individual preferences creates contradictions (different people want incompatible things).
Promising approach: align AI to higher-level institutional principles and values, mirroring democratic institutions operating through transparency and accountability rather than direct preference optimization.
This is the configuration of:
- Public-facing AI assistants (ChatGPT serving many users)
- Government-deployed AI in regulatory or service roles
- AI in democratic institutions
Key open problem: how to handle conflicts. Voting? Deliberation? Market mechanisms? Each implies a different alignment architecture.
Multi-Multi Alignment
Multiple AIs ↔ multiple humans.
The most general configuration — and the realistic future. Multiple AI systems interacting with multiple humans, with potential conflicts of interest at every level.
The misalignment-vs-misuse distinction blurs here:
- Misalignment risk — AI gaining power over humans
- Misuse risk — humans gaining power over others via AI
- Multi-multi — these can be the same dynamic, just framed differently
Success requires system design preventing problematic power concentrations while enabling beneficial cooperation across the entire human-AI network.
The Independence-Failure Insight
The Atlas’s deepest point: perfect individual alignment cannot guarantee safe collective behavior.
“Different principals may have conflicting interests, or systems might fail coordinating despite aligned goals. Perfect driver-law alignment doesn’t prevent traffic jams or accidents. Multi-agent system failures represent distinct failure modes beyond individual agent problems.”
Each configuration introduces failure modes the previous one didn’t have:
- Single-single — alignment with the principal
- Single-multi — single-single failures + sub-AI coordination failures
- Multi-single — preference aggregation, conflicting principals
- Multi-multi — all of the above + emergent multi-agent dynamics + power concentration risks
Why This Matters Strategically
The four configurations require different research agendas:
- Single-single = current alignment / rlhf / intent alignment work
- Multi-single = institutional alignment, social-choice theory in AI (aligning-to-the-social-contract SR2025 agenda)
- Multi-multi = multi-agent alignment (theory-for-aligning-multiple-ais, tools-for-aligning-multiple-ais SR2025 agendas)
The Atlas’s framing aligns with the SR2025 review’s organization: SR2025 includes a whole “Multi-agent first” section recognizing that single-single alignment alone is insufficient.
Connection to Wiki
- ai-alignment — adds the configuration dimension
- coherent-extrapolated-volition — complementary “what” question
- aligning-to-the-social-contract, theory-for-aligning-multiple-ais, tools-for-aligning-multiple-ais — SR2025 multi-agent agendas
- aligning-to-context, aligning-what, aligned-to-who — SR2025 agendas exploring related questions
- stable-totalitarianism, value-lock-in — risks of single-multi configurations gone wrong
- atlas-ch3-strategies-09-appendix-long-term-questions — primary source
Related Pages
- ai-alignment
- coherent-extrapolated-volition
- aligned-to-who
- aligning-to-the-social-contract
- theory-for-aligning-multiple-ais
- tools-for-aligning-multiple-ais
- aligning-to-context
- aligning-what
- stable-totalitarianism
- value-lock-in
- ai-safety-atlas-textbook
- atlas-ch3-strategies-09-appendix-long-term-questions
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- AI Safety Atlas Ch.3 — Appendix: Long-term Questions — referenced as
[[atlas-ch3-strategies-09-appendix-long-term-questions]]