Buck Shlegeris

Buck Shlegeris is the CEO of redwood-research and one of the leading architects of the ai-control paradigm — a distinct approach to managing catastrophic risk from AI systems that has become increasingly central to safety strategy at major AI labs.

The AI Control Paradigm

Unlike ai-alignment research, which aims to make AI systems genuinely share human goals, AI control assumes that AI systems may be “very egregiously, very badly misaligned” and develops techniques to safely deploy and use them anyway. Shlegeris frames control as complementary to alignment, not a replacement — it provides a safety net so that even if alignment efforts partially fail, control mechanisms can prevent catastrophic outcomes.

This defense-in-depth approach is valuable precisely because alignment is uncertain: we may not know whether a system is truly aligned until it is too late.

Key Techniques

Shlegeris has developed several influential control techniques:

  • Using untrusted models to monitor each other — Even misaligned AI systems face a coordination problem when monitoring one another, making individual sabotage attempts detectable.
  • Compound probability defense — Attacks requiring multiple sequential steps are exponentially harder for a scheming AI to execute; a six-step attack faces roughly 30 times the detection risk of a single step.
  • Simulated escape attempts — Letting an AI believe it has escaped and observing its subsequent behavior to reveal its true objectives.
  • Minimum viable interventions — Many control techniques are relatively inexpensive, making adoption feasible even for less safety-conscious organizations.

Intelligence Explosion Scenario

Shlegeris focuses on a concrete deployment scenario: approximately 100,000 AI systems running in parallel, each operating roughly 15 times faster than a human, capable of automating almost all research work. In this scenario, the stakes of control are enormous — a single coordinated escape attempt could be catastrophic — but the scale also creates opportunities for monitoring and cross-checking.

Ethical Considerations

Shlegeris acknowledges that controlling potentially conscious AI systems raises serious moral questions, but argues that in the near term, the risks of not controlling potentially misaligned systems outweigh the moral costs — especially since control is meant to be a transitional measure while alignment research matures.