Natural abstractions — SR2025 Agenda Snapshot

One-sentence summary: Develop a theory of concepts that explains how they are learned, how they structure a particular system’s understanding, and how mutual translatability can be achieved between different collections of concepts.

Theory of Change

Understand the concepts a system’s understanding is structured with and use them to inspect its “alignment/safety properties” and/or “retarget its search”, i.e. identify utility-function-like components inside an AI and replacing calls to them with calls to “user values” (represented using existing abstractions inside the AI).

Broad Approach

cognitive

Target Case

worst-case

Orthodox Problems Addressed

Instrumental convergence, Superintelligence can fool human supervisors, Humans cannot be first-class parties to a superintelligent value handshake

Key People

John Wentworth, Paul Colognese, David Lorrell, Sam Eisenstat, Fernando Rosas

Funding

Estimated FTEs: 1-10

Critiques

Chan et al (2023), Soto, Harwood, Soares (2023)

Outputs in 2025

10 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: natural-abstractions (these were generated alongside this file from the same export).

Source

Row in shallow-review-2025/agendas.csv (name = Natural abstractions) — Shallow Review of Technical AI Safety 2025.

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.

Summary: AI Safety (Wikipedia) — referenced as [[ai-safety]]

AI Safety Compendium

Explorer

Natural abstractions

Natural abstractions — SR2025 Agenda Snapshot

Theory of Change

Broad Approach

Target Case

Orthodox Problems Addressed

Key People

Funding

Critiques

See Also

Outputs in 2025

Source

Sources cited

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Natural abstractions

Natural abstractions — SR2025 Agenda Snapshot

Theory of Change

Broad Approach

Target Case

Orthodox Problems Addressed

Key People

Funding

Critiques

See Also

Outputs in 2025

Source

Related Pages

Sources cited

Graph View

Graph view

Table of Contents

Backlinks