Tools for aligning multiple AIs — SR2025 Agenda Snapshot

One-sentence summary: Develop tools and techniques for designing and testing multi-agent AI scenarios, for auditing real-world multi-agent AI dynamics, and for aligning AIs in multi-AI settings.

Theory of Change

Addressing multi-agent AI dynamics is key for aligning near-future agents and their impact on the world. Feedback loops from multi-agent dynamics can radically change the future AI landscape, and require a different toolset from model psychology to audit and control.

Broad Approach

engineering / behavioral

Target Case

mixed

Orthodox Problems Addressed

Goals misgeneralize out of distribution, Superintelligence can fool human supervisors, Superintelligence can hack software supervisors

Key People

Andrew Critch, Lewis Hammond, Emery Cooper, Allan Chan, Caspar Oesterheld, Vincent Conitzer, Gillian Hadfield, Nathaniel Sauerberg, Zhijing Jin

Funding

Coefficient Giving, Deepmind, Cooperative AI Foundation

Estimated FTEs: 10 - 15

See Also

theory-for-aligning-multiple-ais, aligning-what

Outputs in 2025

12 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: tools-for-aligning-multiple-ais (these were generated alongside this file from the same export).

Source

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.