LLMs Outperform Experts on Challenging Biology Benchmarks

Lennart Justen — 2025-05-09 — arXiv

Summary

Systematically evaluates 27 frontier LLMs on eight biology benchmarks spanning molecular biology, genetics, virology, and biosecurity, tracking dangerous capability growth from November 2022 to April 2025.

Key Result

OpenAI’s o3 now performs twice as well as expert virologists on the Virology Capabilities Test, with top model performance increasing more than 4-fold over the study period.

Source