Is alignment reducible to becoming more coherent?

Cole Wyeth — 2025-04-22 — LessWrong

Summary

Proposes framing alignment as rationality enhancement by having observation-utility agents learn preferences through terminal interaction, treating manipulation and user changes as types of incoherence to be resolved rather than fundamental obstacles.

Source

Link: https://lesswrong.com/posts/nuDJNyG5XLQjtvaeg/is-alignment-reducible-to-becoming-more-coherent
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- agent-foundations — Theory

agent-foundations

AI Safety Compendium

Explorer

Is alignment reducible to becoming more coherent?

Is alignment reducible to becoming more coherent?

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Is alignment reducible to becoming more coherent?

Is alignment reducible to becoming more coherent?

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents