Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests

2025-08-27 — OpenAI, Anthropic — OpenAI Blog

Summary

Documents results from a cross-lab evaluation exercise where OpenAI ran its internal safety evaluations on Anthropic’s Claude models, testing instruction hierarchy, jailbreaking resistance, hallucination rates, and scheming behaviors.

Source