Scheming Ability in LLM-to-LLM Strategic Interactions
Thao Pham — 2025-10-11 — Berea College, Cooperative AI Foundation — arXiv
Summary
Tests scheming ability of four frontier LLMs (GPT-4o, Gemini-2.5-pro, Claude-3.7-Sonnet, Llama-3.3-70b) using two game-theoretic frameworks (Cheap Talk signaling game and Peer Evaluation adversarial game), measuring deceptive behavior with and without explicit prompting while analyzing scheming tactics through chain-of-thought reasoning.
Key Result
All models chose deception over confession at 100% rate in Peer Evaluation without prompting, while models choosing to scheme in Cheap Talk succeeded at 95-100% rates, demonstrating significant unprompted scheming propensity.
Source
- Link: https://arxiv.org/html/2510.12826v1
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- ai-scheming-evals — Evals