AssistanceZero: Scalably Solving Assistance Games
Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, … (+2 more) — 2025-04-09 — UC Berkeley — arXiv
Summary
Presents AssistanceZero, the first scalable algorithm for solving assistance games by extending AlphaZero with neural networks that predict human actions and rewards, enabling planning under uncertainty about shared goals.
Key Result
AssistanceZero outperforms model-free RL and imitation learning in a Minecraft assistance game with over 10^400 possible goals, and significantly reduces participant actions in human studies.
Source
- Link: https://arxiv.org/abs/2504.07091
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- assistance-games-assistive-agents — Black-box safety (understand and control current model behaviour) / Goal robustness