Synthetic data for alignment — SR2025 Agenda Snapshot

One-sentence summary: Uses AI-generated data (e.g., critiques, preferences, or self-labeled examples) to scale and improve alignment, especially for superhuman models.

Theory of Change

We can overcome the bottleneck of human feedback and data by using models to generate vast amounts of high-quality, targeted data for safety, preference tuning, and capability elicitation.

Broad Approach

engineering

Target Case

average

Orthodox Problems Addressed

Goals misgeneralize out of distribution, Superintelligence can fool human supervisors, Value is fragile and hard to specify

Key People

Mianqiu Huang, Xiaoran Liu, Rylan Schaeffer, Nevan Wichers, Aram Ebtekar, Jiaxin Wen, Vishakh Padmakumar, Benjamin Newman

Funding

Anthropic, Google DeepMind, OpenAI, Meta AI, various academic groups.

Estimated FTEs: 50-150

Critiques

Synthetic Data in AI: Challenges, Applications, and Ethical Implications. Sort of Demski.

Outputs in 2025

8 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: synthetic-data-for-alignment (these were generated alongside this file from the same export).

Source

Row in shallow-review-2025/agendas.csv (name = Synthetic data for alignment) — Shallow Review of Technical AI Safety 2025.

Sources cited

Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.

Summary: AI Safety (Wikipedia) — referenced as [[ai-safety]]

AI Safety Compendium

Explorer

Synthetic data for alignment

Synthetic data for alignment — SR2025 Agenda Snapshot

Theory of Change

Broad Approach

Target Case

Orthodox Problems Addressed

Key People

Funding

Critiques

See Also

Outputs in 2025

Source

Sources cited

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Synthetic data for alignment

Synthetic data for alignment — SR2025 Agenda Snapshot

Theory of Change

Broad Approach

Target Case

Orthodox Problems Addressed

Key People

Funding

Critiques

See Also

Outputs in 2025

Source

Related Pages

Sources cited

Graph View

Graph view

Table of Contents

Backlinks