Multiplayer Nash Preference Optimization
Fang Wu, Xu Huang, Weihao Xuan, Zhiwei Zhang, Yijia Xiao, Guancheng Wan, … (+5 more) — 2025-09-27 — Stanford University, University of Washington, AI2 — arXiv
Summary
Extends Nash learning from human feedback (NLHF) to multiplayer settings, formulating alignment as an n-player game where policies compete against populations of opponents while being regularized toward a reference model.
Key Result
MNPO consistently outperforms existing two-player NLHF baselines (INPO, ONPO, EGPO) on instruction-following benchmarks, achieving superior alignment quality under heterogeneous annotator conditions and mixed-policy evaluation scenarios.
Source
- Link: https://arxiv.org/abs/2509.23102
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- tools-for-aligning-multiple-ais — Multi-agent first