Brainlike-AGI Safety — SR2025 Agenda Snapshot

One-sentence summary: Social and moral instincts are (partly) implemented in particular hardwired brain circuitry; let’s figure out what those circuits are and how they work; this will involve symbol grounding. “a yet-to-be-invented variation on actor-critic model-based reinforcement learning”

Theory of Change

Fairly-direct alignment via changing training to reflect actual human reward. Get actual data about (reward, training data) → (human values) to help with theorising this map in AIs; “understand human social instincts, and then maybe adapt some aspects of those for AGIs, presumably in conjunction with other non-biological ingredients”.

Broad Approach

cognitivist science

Target Case

worst-case

Key People

Steve Byrnes

Funding

Astera Institute

Estimated FTEs: 1-5

Critiques

Tsvi BT

Outputs in 2025

6 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: brainlike-agi-safety (these were generated alongside this file from the same export).

Source

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.