ControlArena

Rogan Inglis, Ollie Matthews, Tyler Tracy, Oliver Makins, Tom Catling, Asa Cooper Stickland, … (+14 more) — 2025-01-01 — UK AI Security Institute, Redwood Research — GitHub

Summary

A comprehensive library for running AI Control experiments, providing evaluation environments, control protocol components (micro-protocols, policies, monitors), and analysis tools for testing deployment safety measures against intentionally misaligned models.

Source