Alignment to Whom

The “alignment to whom” question is the structural decomposition of AI alignment by principal-agent configuration: are we aligning one AI to one human, multiple AIs to one human, one AI to many humans, or many AIs to many humans? The AI Safety Atlas (Ch.3 long-term-questions appendix) treats these as four distinct alignment problems with different failure modes.

This is complementary to [[coherent-extrapolated-volition|“alignment to what”]]: alignment to whom determines the target principal; alignment to what determines the value content.

The Four Configurations

Single-Single Alignment

One AI ↔ one human.

Current approaches focus on intent alignment — interpreting intended meaning rather than literal commands. This is the foundational unsolved problem; everything else builds on it.

Key challenges:

Human’s intent isn’t always clear, even to themselves
Specification gaps between literal commands and intended outcomes
Faithful interpretation of preferences vs. paternalism

Most current AI safety research operates at this level (see ai-alignment).

Single-Multi Alignment

Multiple AIs ↔ one human.

A coordinated AI system serving an individual. Includes:

ASIs composed of cooperating smaller intelligences
Personal AI assistants with multiple specialized sub-AIs
AI agents coordinating to fulfill one principal’s goals

Critical caveat: still requires solving single-single first. Even cooperating AIs need to faithfully serve their human principal — single-single alignment failures multiply.

The Atlas’s structural concern: “Ideally, no individual or small group should control superintelligence.” Single-multi alignment, taken literally, enables unilateral control of advanced AI by individuals — not necessarily a desirable property.

Multi-Single Alignment

One AI ↔ many humans.

A single AI system serving many people. The challenge: aggregating individual preferences creates contradictions (different people want incompatible things).

Promising approach: align AI to higher-level institutional principles and values, mirroring democratic institutions operating through transparency and accountability rather than direct preference optimization.

This is the configuration of:

Public-facing AI assistants (ChatGPT serving many users)
Government-deployed AI in regulatory or service roles
AI in democratic institutions

Key open problem: how to handle conflicts. Voting? Deliberation? Market mechanisms? Each implies a different alignment architecture.

Multi-Multi Alignment

Multiple AIs ↔ multiple humans.

The most general configuration — and the realistic future. Multiple AI systems interacting with multiple humans, with potential conflicts of interest at every level.

The misalignment-vs-misuse distinction blurs here:

Misalignment risk — AI gaining power over humans
Misuse risk — humans gaining power over others via AI
Multi-multi — these can be the same dynamic, just framed differently

Success requires system design preventing problematic power concentrations while enabling beneficial cooperation across the entire human-AI network.

The Independence-Failure Insight

The Atlas’s deepest point: perfect individual alignment cannot guarantee safe collective behavior.

“Different principals may have conflicting interests, or systems might fail coordinating despite aligned goals. Perfect driver-law alignment doesn’t prevent traffic jams or accidents. Multi-agent system failures represent distinct failure modes beyond individual agent problems.”

Each configuration introduces failure modes the previous one didn’t have:

Single-single — alignment with the principal
Single-multi — single-single failures + sub-AI coordination failures
Multi-single — preference aggregation, conflicting principals
Multi-multi — all of the above + emergent multi-agent dynamics + power concentration risks

Why This Matters Strategically

The four configurations require different research agendas:

Single-single = current alignment / rlhf / intent alignment work
Multi-single = institutional alignment, social-choice theory in AI (aligning-to-the-social-contract SR2025 agenda)
Multi-multi = multi-agent alignment (theory-for-aligning-multiple-ais, tools-for-aligning-multiple-ais SR2025 agendas)

The Atlas’s framing aligns with the SR2025 review’s organization: SR2025 includes a whole “Multi-agent first” section recognizing that single-single alignment alone is insufficient.

Connection to Wiki

ai-alignment — adds the configuration dimension
coherent-extrapolated-volition — complementary “what” question
aligning-to-the-social-contract, theory-for-aligning-multiple-ais, tools-for-aligning-multiple-ais — SR2025 multi-agent agendas
aligning-to-context, aligning-what, aligned-to-who — SR2025 agendas exploring related questions
stable-totalitarianism, value-lock-in — risks of single-multi configurations gone wrong
atlas-ch3-strategies-09-appendix-long-term-questions — primary source

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.

AI Safety Atlas Ch.3 — Appendix: Long-term Questions — referenced as [[atlas-ch3-strategies-09-appendix-long-term-questions]]

AI Safety Compendium

Explorer

Alignment to Whom

Alignment to Whom

The Four Configurations

Single-Single Alignment

Single-Multi Alignment

Multi-Single Alignment

Multi-Multi Alignment

The Independence-Failure Insight

Why This Matters Strategically

Connection to Wiki

Sources cited

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Alignment to Whom

Alignment to Whom

The Four Configurations

Single-Single Alignment

Single-Multi Alignment

Multi-Single Alignment

Multi-Multi Alignment

The Independence-Failure Insight

Why This Matters Strategically

Connection to Wiki

Related Pages

Sources cited

Graph View

Graph view

Table of Contents

Backlinks