Is alignment reducible to becoming more coherent?

Cole Wyeth — 2025-04-22 — LessWrong

Summary

Proposes framing alignment as rationality enhancement by having observation-utility agents learn preferences through terminal interaction, treating manipulation and user changes as types of incoherence to be resolved rather than fundamental obstacles.

Source