Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests
2025-08-27 — OpenAI, Anthropic — OpenAI Blog
Summary
Documents results from a cross-lab evaluation exercise where OpenAI ran its internal safety evaluations on Anthropic’s Claude models, testing instruction hierarchy, jailbreaking resistance, hallucination rates, and scheming behaviors.
Source
- Link: https://openai.com/index/openai-anthropic-safety-evaluation
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- openai — Labs (giant companies)