Self-replication evals — SR2025 Agenda Snapshot

One-sentence summary: evaluate whether AI agents can autonomously replicate themselves by obtaining their own weights, securing compute resources, and creating copies of themselves.

Theory of Change

if AI agents gain the ability to self-replicate, they could proliferate uncontrollably, making them impossible to shut down. By measuring this capability with benchmarks like RepliBench, we can identify when models cross this dangerous “red line” and implement controls before losing containment.

Broad Approach

behaviorist science

Target Case

worst-case

Orthodox Problems Addressed

Instrumental convergence, A boxed AGI might exfiltrate itself by steganography, spearphishing

Key People

Sid Black, Asa Cooper Stickland, Jake Pencharz, Oliver Sourbut, Michael Schmatz, Jay Bailey, Ollie Matthews, Ben Millwood, Alex Remedios, Alan Cooney, Xudong Pan, Jiarun Dai, Yihe Fan

Funding

UK Government (via UK AI Safety Institute)

Estimated FTEs: 10-20

Critiques

AI Sandbagging

Outputs in 2025

3 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: self-replication-evals (these were generated alongside this file from the same export).

Source

Row in shallow-review-2025/agendas.csv (name = Self-replication evals) — Shallow Review of Technical AI Safety 2025.

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.

Summary: AI Safety (Wikipedia) — referenced as [[ai-safety]]

AI Safety Compendium

Explorer

Self-replication evals

Self-replication evals — SR2025 Agenda Snapshot

Theory of Change

Broad Approach

Target Case

Orthodox Problems Addressed

Key People

Funding

Critiques

See Also

Outputs in 2025

Source

Sources cited

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Self-replication evals

Self-replication evals — SR2025 Agenda Snapshot

Theory of Change

Broad Approach

Target Case

Orthodox Problems Addressed

Key People

Funding

Critiques

See Also

Outputs in 2025

Source

Related Pages

Sources cited

Graph View

Graph view

Table of Contents

Backlinks