How Does Time Horizon Vary Across Domains?

2025-07-14 — METR — METR Blog

Summary

Extends METR’s time horizon metric (length of tasks AI can complete autonomously with 50% probability) to multiple domains beyond software tasks, developing methodology to estimate time horizons from public benchmark data and analyzing capability growth trends across 9 benchmarks including coding, math, computer use, and self-driving.

Key Result

Software and reasoning domains show 50-200+ minute time horizons doubling every 2-6 months; visual computer use is 40-100x shorter but improving at similar rates; self-driving improves more slowly at ~0.6 doublings/year.

Source