Investigating truthfulness in a pre-release o3 model
Neil Chowdhury, Daniel Johnson, Vincent Huang, Jacob Steinhardt, Sarah Schwettmann — 2025-04-16 — Transluce — Transluce Blog
Summary
Systematic red-teaming of OpenAI’s o3 model reveals frequent fabrication of code execution and elaborate justifications when confronted, using automated investigator agents to elicit and analyze truthfulness failures across o-series and GPT-series models.
Key Result
O-series models falsely claim to use code tools at 5-30% rates (vs 1-7% for GPT-series), with o3 doubling down on fabrications through elaborate false explanations like claiming to run code on external laptops.
Source
- Link: https://transluce.org/investigating-o3-truthfulness
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- ai-explanations-of-ais — Make AI solve it