Brainlike-AGI Safety — SR2025 Agenda Snapshot
One-sentence summary: Social and moral instincts are (partly) implemented in particular hardwired brain circuitry; let’s figure out what those circuits are and how they work; this will involve symbol grounding. “a yet-to-be-invented variation on actor-critic model-based reinforcement learning”
Theory of Change
Fairly-direct alignment via changing training to reflect actual human reward. Get actual data about (reward, training data) → (human values) to help with theorising this map in AIs; “understand human social instincts, and then maybe adapt some aspects of those for AGIs, presumably in conjunction with other non-biological ingredients”.
Broad Approach
cognitivist science
Target Case
worst-case
Key People
Steve Byrnes
Funding
Astera Institute
Estimated FTEs: 1-5
Critiques
Outputs in 2025
6 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: brainlike-agi-safety (these were generated alongside this file from the same export).
Source
- Row in
shallow-review-2025/agendas.csv(name = Brainlike-AGI Safety) — Shallow Review of Technical AI Safety 2025.
Related Pages
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- Summary: AI Safety (Wikipedia) — referenced as
[[ai-safety]]