Observation Interference in Partially Observable Assistance Games

Scott Emmons, Caspar Oesterheld, Vincent Conitzer, Stuart Russell — 2024-12-23 — UC Berkeley — arXiv

Summary

Extends the assistance games framework to partial observability and proves that optimal AI assistants sometimes must interfere with human observations, even when this seems to contradict classical decision theory about the value of information.

Key Result

Optimal assistants sometimes must take observation-interfering actions even when the human is playing optimally and otherwise-equivalent non-interfering actions exist, resolving the seeming contradiction by developing a policy-level notion of interference.

Source