Model-Based Soft Maximization of Suitable Metrics of Long-Term Human Power
Jobst Heitzig, Ram Potham — 2025-07-31 — arXiv
Summary
Proposes a parametrizable objective function for AI agents that represents inequality- and risk-averse long-term aggregate human power, with algorithms for computing it via backward induction or multi-agent reinforcement learning from world models.
Source
- Link: https://arxiv.org/abs/2508.00159
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- behavior-alignment-theory — Theory / Corrigibility