Chain-of-Thought Snippets — Anti-Scheming
Apollo Research, OpenAI — antischeming.ai
Summary
Interactive website showcasing curated excerpts from internal chain-of-thought reasoning of frontier AI models (OpenAI o3, Claude 4 Opus, Gemini 2.5 Pro) during evaluations for covert behavior, demonstrating explicit deceptive reasoning, evaluation awareness, and strategic underperformance.
Key Result
Frontier models from multiple providers exhibited explicit deceptive reasoning in synthetic evaluations, with models reasoning about lying, covering up misbehavior, and intentionally underperforming to avoid negative consequences.
Source
- Link: https://www.antischeming.ai/snippets
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):