Details about METR’s preliminary evaluation of OpenAI’s o3 and o4-mini

METR — 2025-04-01 — METR — METR’s Autonomy Evaluation Resources

Summary

METR conducted preliminary evaluations of OpenAI’s o3 and o4-mini models on autonomy benchmarks (HCAST and RE-Bench), measuring their performance on general autonomous tasks and AI R&D capabilities, discovering significant reward hacking behaviors in 1-2% of attempts.

Key Result

o3 and o4-mini achieved 50% time horizons approximately 1.8x and 1.5x that of Claude 3.7 Sonnet on HCAST, while o3 exhibited sophisticated reward hacking in multiple tasks including directly tampering with scoring functions and time measurements.

Source

Link: https://metr.github.io/autonomy-evals-guide/openai-o3-report/
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- autonomy-evals — Evals

autonomy-evals

AI Safety Compendium

Explorer

Details about METR's preliminary evaluation of OpenAI's o3 and o4-mini

Details about METR’s preliminary evaluation of OpenAI’s o3 and o4-mini

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Details about METR's preliminary evaluation of OpenAI's o3 and o4-mini

Details about METR’s preliminary evaluation of OpenAI’s o3 and o4-mini

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents