Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well). Subtleties and Open Challenges
Roland Pihlakas — 2025-01-12 — LessWrong
Summary
Proposes multi-objective homeostasis (maintaining variables within bounded target ranges rather than unbounded maximization) as essential for AI alignment, arguing it naturally enables corrigibility, task-based behavior, and reduced incentive for extreme optimization.
Source
- Link: https://lesswrong.com/posts/vGeuBKQ7nzPnn5f7A/why-modelling-multi-objective-homeostasis-is-essential-for
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- mild-optimisation — Black-box safety (understand and control current model behaviour) / Goal robustness