Details about METR’s evaluation of OpenAI GPT-5
METR — 2025-08-01 — METR — METR’s Autonomy Evaluation Resources
Summary
METR’s comprehensive pre-deployment evaluation of GPT-5 assessed catastrophic risks via AI R&D automation, rogue replication, and strategic sabotage threat models using time-horizon methodology, reasoning trace analysis, and sandbagging detection experiments.
Key Result
GPT-5 achieved a 50% time-horizon of 2 hours 17 minutes on autonomous software engineering tasks, showing evidence of situational awareness but no strategic sabotage, with capabilities assessed as far below thresholds for catastrophic risk.
Source
- Link: https://metr.github.io/autonomy-evals-guide/gpt-5-report/
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- autonomy-evals — Evals