Recommendations for Technical AI Safety Research Directions
Anthropic Alignment Science Team — 2025 — Anthropic — Alignment Science Blog
Summary
Anthropic’s Alignment Science team presents a broad research agenda covering open problems in AI safety, including evaluating capabilities and alignment, AI control, scalable oversight, adversarial robustness, and multi-agent alignment.
Source
- Link: https://alignment.anthropic.com/2025/recommended-directions/index.html
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- anthropic — Labs (giant companies)