A single principle related to many Alignment subproblems?

Q Home — 2025-04-30 — LessWrong

Summary

Proposes a philosophical principle that humans only care about variables they could easily optimize or comprehend, arguing this can be formalized with mathematical properties to help solve multiple alignment subproblems including ontology identification, ELK, impact measures, and inner/outer alignment.

Source

Link: https://lesswrong.com/posts/h89L5FMAkEBNsZ3xM/a-single-principle-related-to-many-alignment-subproblems-2
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- natural-abstractions — Theory / Ontology Identification

natural-abstractions

AI Safety Compendium

Explorer

A single principle related to many Alignment subproblems?

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

A single principle related to many Alignment subproblems?

A single principle related to many Alignment subproblems?

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents