A single principle related to many Alignment subproblems?
Q Home — 2025-04-30 — LessWrong
Summary
Proposes a philosophical principle that humans only care about variables they could easily optimize or comprehend, arguing this can be formalized with mathematical properties to help solve multiple alignment subproblems including ontology identification, ELK, impact measures, and inner/outer alignment.
Source
- Link: https://lesswrong.com/posts/h89L5FMAkEBNsZ3xM/a-single-principle-related-to-many-alignment-subproblems-2
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- natural-abstractions — Theory / Ontology Identification