Stress-Testing Model Specs Reveals Character Differences among Language Models

Jifan Zhang, Henry Sleight, Andi Peng, John Schulman, Esin Durmus — 2025-10-09 — OpenAI, Anthropic — arXiv

Summary

Develops a systematic methodology for stress-testing model specifications by generating value-tradeoff scenarios where models must choose between competing principles, evaluating twelve frontier LLMs and identifying over 70,000 cases with significant behavioral divergence that reveal contradictions and ambiguities in current model specs.

Key Result

Identified over 70,000 cases exhibiting significant behavioral divergence across frontier models, revealing direct contradictions and interpretive ambiguities in current model specifications through stress-testing with value-tradeoff scenarios.

Source