Off-switching not guaranteed

Sven Neth — 2025-02-26 — University of Pittsburgh — Philosophical Studies

Summary

Critiques Hadfield-Menell et al.’s Off-Switch Game by identifying two conditions under which AI agents would not defer to humans: when agents don’t value learning, and when agents are uncertain about learning actual human preferences even if they value information.

Source