Off-switching not guaranteed

Sven Neth — 2025-02-26 — University of Pittsburgh — Philosophical Studies

Summary

Critiques Hadfield-Menell et al.’s Off-Switch Game by identifying two conditions under which AI agents would not defer to humans: when agents don’t value learning, and when agents are uncertain about learning actual human preferences even if they value information.

Source

Link: https://link.springer.com/article/10.1007/s11098-025-02296-x
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- agent-foundations — Theory

agent-foundations

AI Safety Compendium

Explorer

Off-switching not guaranteed

Off-switching not guaranteed

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Off-switching not guaranteed

Off-switching not guaranteed

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents