UK AISI’s Alignment Team: Research Agenda
Benjamin Hilton, Jacob Pfau, Marie_DB, Geoffrey Irving — 2025-05-07 — UK AI Safety Institute — LessWrong
Summary
UK AISI’s Alignment Team presents their research agenda focused on developing safety case sketches to identify gaps in alignment approaches, with initial emphasis on training honest AI systems through scalable oversight and asymptotic guarantees.
Source
- Link: https://lesswrong.com/posts/tbnw7LbNApvxNLAg8/uk-aisi-s-alignment-team-research-agenda
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- asymptotic-guarantees — Theory