Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests

2025-08-27 — OpenAI, Anthropic — OpenAI Blog

Summary

Documents results from a cross-lab evaluation exercise where OpenAI ran its internal safety evaluations on Anthropic’s Claude models, testing instruction hierarchy, jailbreaking resistance, hallucination rates, and scheming behaviors.

Source

Link: https://openai.com/index/openai-anthropic-safety-evaluation
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- openai — Labs (giant companies)

openai

AI Safety Compendium

Explorer

Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests

Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests

Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents