Multiplayer Nash Preference Optimization

Fang Wu, Xu Huang, Weihao Xuan, Zhiwei Zhang, Yijia Xiao, Guancheng Wan, … (+5 more) — 2025-09-27 — Stanford University, University of Washington, AI2 — arXiv

Summary

Extends Nash learning from human feedback (NLHF) to multiplayer settings, formulating alignment as an n-player game where policies compete against populations of opponents while being regularized toward a reference model.

Key Result

MNPO consistently outperforms existing two-player NLHF baselines (INPO, ONPO, EGPO) on instruction-following benchmarks, achieving superior alignment quality under heterogeneous annotator conditions and mixed-policy evaluation scenarios.

Source

Link: https://arxiv.org/abs/2509.23102
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- tools-for-aligning-multiple-ais — Multi-agent first

tools-for-aligning-multiple-ais

AI Safety Compendium

Explorer

Multiplayer Nash Preference Optimization

Multiplayer Nash Preference Optimization

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Multiplayer Nash Preference Optimization

Multiplayer Nash Preference Optimization

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents