Tools for aligning multiple AIs — SR2025 Agenda Snapshot

One-sentence summary: Develop tools and techniques for designing and testing multi-agent AI scenarios, for auditing real-world multi-agent AI dynamics, and for aligning AIs in multi-AI settings.

Theory of Change

Addressing multi-agent AI dynamics is key for aligning near-future agents and their impact on the world. Feedback loops from multi-agent dynamics can radically change the future AI landscape, and require a different toolset from model psychology to audit and control.

Broad Approach

engineering / behavioral

Target Case

mixed

Orthodox Problems Addressed

Goals misgeneralize out of distribution, Superintelligence can fool human supervisors, Superintelligence can hack software supervisors

Key People

Andrew Critch, Lewis Hammond, Emery Cooper, Allan Chan, Caspar Oesterheld, Vincent Conitzer, Gillian Hadfield, Nathaniel Sauerberg, Zhijing Jin

Funding

Coefficient Giving, Deepmind, Cooperative AI Foundation

Estimated FTEs: 10 - 15

Outputs in 2025

12 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: tools-for-aligning-multiple-ais (these were generated alongside this file from the same export).

Source

Row in shallow-review-2025/agendas.csv (name = Tools for aligning multiple AIs) — Shallow Review of Technical AI Safety 2025.

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.

Summary: AI Safety (Wikipedia) — referenced as [[ai-safety]]

AI Safety Compendium

Explorer

Tools for aligning multiple AIs

Tools for aligning multiple AIs — SR2025 Agenda Snapshot

Theory of Change

Broad Approach

Target Case

Orthodox Problems Addressed

Key People

Funding

See Also

Outputs in 2025

Source

Sources cited

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Tools for aligning multiple AIs

Tools for aligning multiple AIs — SR2025 Agenda Snapshot

Theory of Change

Broad Approach

Target Case

Orthodox Problems Addressed

Key People

Funding

See Also

Outputs in 2025

Source

Related Pages

Sources cited

Graph View

Graph view

Table of Contents

Backlinks