The “Neglected Approaches” Approach — SR2025 Agenda Snapshot
One-sentence summary: Agenda-agnostic approaches to identifying good but overlooked empirical alignment ideas, working with theorists who could use engineers, and prototyping them.
Theory of Change
Empirical search for “negative alignment taxes” (prioritizing methods that simultaneously enhance alignment and capabilities)
Broad Approach
engineering
Target Case
average
Orthodox Problems Addressed
Someone else will deploy unsafe superintelligence first
Key People
AE Studio, Gunnar Zarncke, Cameron Berg, Michael Vaiana, Judd Rosenblatt, Diogo Schwerz de Lucena
Funding
AE Studio
Estimated FTEs: 15
Critiques
The ‘Alignment Bonus’ is a Dangerous Mirage
See Also
Iterative alignment, automated alignment research, Beijing Key Laboratory of Safe AI and Superalignment, Aligned AI
Outputs in 2025
3 item(s) in the review. See the wiki/summaries/ entries with frontmatter agenda: the-neglected-approaches-approach (these were generated alongside this file from the same export).
Source
- Row in
shallow-review-2025/agendas.csv(name = The “Neglected Approaches” Approach) — Shallow Review of Technical AI Safety 2025.
Related Pages
- ai-safety
- ai-safety
- assistance-games-assistive-agents
- black-box-make-ai-solve-it
- capability-removal-unlearning
- chain-of-thought-monitoring
- character-training-and-persona-steering
- control
- data-filtering
- data-poisoning-defense
- data-quality-for-alignment
- emergent-misalignment
- harm-reduction-for-open-weights
- hyperstition-studies
- inference-time-in-context-learning
- inference-time-steering
- inoculation-prompting
- iterative-alignment-at-post-train-time
- iterative-alignment-at-pretrain-time
- mild-optimisation
- model-psychopathology
- model-specs-and-constitutions
- model-values-model-preferences
- rl-safety
- safeguards-inference-time-auxiliaries
- synthetic-data-for-alignment
Sources cited
Primary URLs harvested from this page’s summary references. Auto-generated by scripts/backfill_citations.py; edit by re-running, not by hand.
- Summary: AI Safety (Wikipedia) — referenced as
[[ai-safety]]