Tools for aligning multiple AIs — SR2025 Agenda Snapshot
One-sentence summary: Develop tools and techniques for designing and testing multi-agent AI scenarios, for auditing real-world multi-agent AI dynamics, and for aligning AIs in multi-AI settings.
Theory of Change
Addressing multi-agent AI dynamics is key for aligning near-future agents and their impact on the world. Feedback loops from multi-agent dynamics can radically change the future AI landscape, and require a different toolset from model psychology to audit and control.
Broad Approach
engineering / behavioral
Target Case
mixed
Orthodox Problems Addressed
Goals misgeneralize out of distribution, Superintelligence can fool human supervisors, Superintelligence can hack software supervisors
Key People
Andrew Critch, Lewis Hammond, Emery Cooper, Allan Chan, Caspar Oesterheld, Vincent Conitzer, Gillian Hadfield, Nathaniel Sauerberg, Zhijing Jin
Funding
Coefficient Giving, Deepmind, Cooperative AI Foundation
Estimated FTEs: 10 - 15
See Also
theory-for-aligning-multiple-ais, aligning-what
Outputs in 2025
12 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: tools-for-aligning-multiple-ais (these were generated alongside this file from the same export).
Source
- Row in
shallow-review-2025/agendas.csv(name = Tools for aligning multiple AIs) — Shallow Review of Technical AI Safety 2025.
Related Pages
- ai-safety
- ai-safety
- aligning-what
- theory-for-aligning-multiple-ais
- aligned-to-who
- aligning-to-context
- aligning-to-the-social-contract
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- Summary: AI Safety (Wikipedia) — referenced as
[[ai-safety]]