The Partially Observable Off-Switch Game

Andrew Garber, Rohan Subramani, Linus Luu, Mark Bedaywi, Stuart Russell, Scott Emmons — 2024-11-25 — UC Berkeley — arXiv

Summary

Introduces the Partially Observable Off-Switch Game (PO-OSG), a game-theoretic formalization of the shutdown problem with asymmetric information between humans and AI agents, analyzing conditions under which AIs defer to human shutdown commands.

Key Result

Under optimal play with partial observability, AI agents assisting perfectly rational humans sometimes avoid shutdown, and introducing bounded communication can counterintuitively reduce AI deference despite mitigating information asymmetry.

Source