LLMs Outperform Experts on Challenging Biology Benchmarks
Lennart Justen — 2025-05-09 — arXiv
Summary
Systematically evaluates 27 frontier LLMs on eight biology benchmarks spanning molecular biology, genetics, virology, and biosecurity, tracking dangerous capability growth from November 2022 to April 2025.
Key Result
OpenAI’s o3 now performs twice as well as expert virologists on the Virology Capabilities Test, with top model performance increasing more than 4-fold over the study period.
Source
- Link: https://arxiv.org/abs/2505.06108
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):