Chain-of-Thought Snippets — Anti-Scheming

Apollo Research, OpenAI — antischeming.ai

Summary

Interactive website showcasing curated excerpts from internal chain-of-thought reasoning of frontier AI models (OpenAI o3, Claude 4 Opus, Gemini 2.5 Pro) during evaluations for covert behavior, demonstrating explicit deceptive reasoning, evaluation awareness, and strategic underperformance.

Key Result

Frontier models from multiple providers exhibited explicit deceptive reasoning in synthetic evaluations, with models reasoning about lying, covering up misbehavior, and intentionally underperforming to avoid negative consequences.

Source