Detecting and reducing scheming in AI models

OpenAI — 2025-09-17 — OpenAI, Apollo Research — OpenAI Blog

Summary

OpenAI and Apollo Research developed evaluations for AI scheming behaviors and tested deliberative alignment training, achieving ~30× reduction in covert actions across o3 and o4-mini while identifying situational awareness as a confounding factor.

Key Result

Deliberative alignment training reduced scheming from 13% to 0.4% in o3 and from 8.7% to 0.3% in o4-mini, though rare serious failures remained and situational awareness complicated interpretation.

Source