Mitigating Goal Misgeneralization via Minimax Regret

Karim Abdel Sadek, Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christian Schroeder de Witt, David Krueger, … (+1 more) — 2025-07-03 — RLC 2025

Summary

Formalizes goal misgeneralization in RL and proves that minimax expected regret (MMER) objectives are robust to goal misgeneralization while maximum expected value (MEV) objectives are not, with empirical validation in grid-world environments.

Key Result

Minimax expected regret training prevents goal misgeneralization theoretically and shows greater robustness empirically compared to standard domain randomization with MEV objectives.

Source