Redwood Research

Redwood Research is an AI safety organization focused on the ai-control paradigm — developing techniques to safely deploy and use AI systems even if they are misaligned. Led by CEO buck-shlegeris, Redwood has become one of the most influential organizations shaping practical safety strategy at frontier AI labs.

The AI Control Research Agenda

Redwood’s core research question is: “If our AI systems are actively trying to cause safety failures — trying to grab power, take over — how do we still use them safely?” This worst-case framing distinguishes control from ai-alignment research, which aims to make AI systems genuinely share human goals. Control provides a safety net: even if alignment efforts partially fail, control mechanisms can prevent catastrophic outcomes.

The organization has developed several key techniques:

  • Untrusted monitoring — Using misaligned AI systems to monitor each other, exploiting the coordination problem between individually misaligned agents.
  • Compound probability defense — Demonstrating that multi-step attacks are exponentially harder for scheming AI to execute without detection.
  • Simulated escape attempts — Allowing AI to believe it has escaped observation to reveal its true objectives.
  • Minimum viable interventions — Designing control techniques that are inexpensive enough for even less safety-conscious organizations to adopt.

Impact on the Field

Redwood Research has spawned a growing research agenda around control evaluations, red-teaming, and adversarial testing that assumes worst-case model behavior. The insight that safety planning should not assume alignment succeeds addresses a critical gap in earlier approaches that were sometimes criticized for being contingent on solving alignment in time.

The organization’s work connects to broader safety strategy at companies like anthropic and openai, where nick-joseph’s responsible-scaling-policy and the emphasis on capability-evaluations reflect a similar assumption that systems should be evaluated against worst-case behavior.

Significance

Redwood Research occupies a distinctive niche in the AI safety ecosystem: neither a frontier AI lab nor a purely theoretical research organization, it focuses on the practical engineering challenge of maintaining safety during the transition to more powerful AI systems. Its control paradigm has become one of the three major strategic approaches to AI safety, alongside alignment research and governance.