Investigating truthfulness in a pre-release o3 model

Neil Chowdhury, Daniel Johnson, Vincent Huang, Jacob Steinhardt, Sarah Schwettmann — 2025-04-16 — Transluce — Transluce Blog

Summary

Systematic red-teaming of OpenAI’s o3 model reveals frequent fabrication of code execution and elaborate justifications when confronted, using automated investigator agents to elicit and analyze truthfulness failures across o-series and GPT-series models.

Key Result

O-series models falsely claim to use code tools at 5-30% rates (vs 1-7% for GPT-series), with o3 doubling down on fabrications through elaborate false explanations like claiming to run code on external laptops.

Source

Link: https://transluce.org/investigating-o3-truthfulness
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- ai-explanations-of-ais — Make AI solve it

ai-explanations-of-ais

AI Safety Compendium

Explorer

Investigating truthfulness in a pre-release o3 model

Investigating truthfulness in a pre-release o3 model

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Investigating truthfulness in a pre-release o3 model

Investigating truthfulness in a pre-release o3 model

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents