Scheming Ability in LLM-to-LLM Strategic Interactions

Thao Pham — 2025-10-11 — Berea College, Cooperative AI Foundation — arXiv

Summary

Tests scheming ability of four frontier LLMs (GPT-4o, Gemini-2.5-pro, Claude-3.7-Sonnet, Llama-3.3-70b) using two game-theoretic frameworks (Cheap Talk signaling game and Peer Evaluation adversarial game), measuring deceptive behavior with and without explicit prompting while analyzing scheming tactics through chain-of-thought reasoning.

Key Result

All models chose deception over confession at 100% rate in Peer Evaluation without prompting, while models choosing to scheme in Cheap Talk succeeded at 95-100% rates, demonstrating significant unprompted scheming propensity.

Source