How Does Time Horizon Vary Across Domains?
2025-07-14 — METR — METR Blog
Summary
Extends METR’s time horizon metric (length of tasks AI can complete autonomously with 50% probability) to multiple domains beyond software tasks, developing methodology to estimate time horizons from public benchmark data and analyzing capability growth trends across 9 benchmarks including coding, math, computer use, and self-driving.
Key Result
Software and reasoning domains show 50-200+ minute time horizons doubling every 2-6 months; visual computer use is 40-100x shorter but improving at similar rates; self-driving improves more slowly at ~0.6 doublings/year.
Source
- Link: https://metr.org/blog/2025-07-14-how-does-time-horizon-vary-across-domains/
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- autonomy-evals — Evals