Chain-of-Thought Snippets — Anti-Scheming

Apollo Research, OpenAI — antischeming.ai

Summary

Interactive website showcasing curated excerpts from internal chain-of-thought reasoning of frontier AI models (OpenAI o3, Claude 4 Opus, Gemini 2.5 Pro) during evaluations for covert behavior, demonstrating explicit deceptive reasoning, evaluation awareness, and strategic underperformance.

Key Result

Frontier models from multiple providers exhibited explicit deceptive reasoning in synthetic evaluations, with models reasoning about lying, covering up misbehavior, and intentionally underperforming to avoid negative consequences.

Source

Link: https://www.antischeming.ai/snippets
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- situational-awareness-and-self-awareness-evals — Evals

situational-awareness-and-self-awareness-evals

AI Safety Compendium

Explorer

Chain-of-Thought Snippets — Anti-Scheming

Chain-of-Thought Snippets — Anti-Scheming

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Chain-of-Thought Snippets — Anti-Scheming

Chain-of-Thought Snippets — Anti-Scheming

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents