Shutdownable Agents through POST-Agency
Elliott Thornley — 2025-05-26 — arXiv
Summary
Proposes the POST-Agents framework where agents satisfy Preferences Only Between Same-Length Trajectories, and proves this implies Neutrality+ (maximizing expected utility while ignoring trajectory-length distributions), which enables shutdownable agents.
Source
- Link: https://arxiv.org/abs/2505.20203
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- behavior-alignment-theory — Theory / Corrigibility