Observation Interference in Partially Observable Assistance Games

Scott Emmons, Caspar Oesterheld, Vincent Conitzer, Stuart Russell — 2024-12-23 — UC Berkeley — arXiv

Summary

Extends the assistance games framework to partial observability and proves that optimal AI assistants sometimes must interfere with human observations, even when this seems to contradict classical decision theory about the value of information.

Key Result

Optimal assistants sometimes must take observation-interfering actions even when the human is playing optimally and otherwise-equivalent non-interfering actions exist, resolving the seeming contradiction by developing a policy-level notion of interference.

Source

Link: https://arxiv.org/abs/2412.17797
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- assistance-games-assistive-agents — Black-box safety (understand and control current model behaviour) / Goal robustness

assistance-games-assistive-agents

AI Safety Compendium

Explorer

Observation Interference in Partially Observable Assistance Games

Observation Interference in Partially Observable Assistance Games

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Observation Interference in Partially Observable Assistance Games

Observation Interference in Partially Observable Assistance Games

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents