Shutdownable Agents through POST-Agency

Elliott Thornley — 2025-05-26 — arXiv

Summary

Proposes the POST-Agents framework where agents satisfy Preferences Only Between Same-Length Trajectories, and proves this implies Neutrality+ (maximizing expected utility while ignoring trajectory-length distributions), which enables shutdownable agents.

Source

Link: https://arxiv.org/abs/2505.20203
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- behavior-alignment-theory — Theory / Corrigibility

behavior-alignment-theory

AI Safety Compendium

Explorer

Shutdownable Agents through POST-Agency

Shutdownable Agents through POST-Agency

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Shutdownable Agents through POST-Agency

Shutdownable Agents through POST-Agency

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents