Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Boyi Wei, Zora Che, Nathaniel Li, Udari Madhushani Sehwag, Jasper Götting, Samira Nedungadi, … (+7 more) — 2025-10-31 — Anthropic, Redwood Research — arXiv

Summary

Proposes BioRiskEval, a framework to evaluate whether data filtering procedures effectively prevent bio-foundation models from enabling bioweapon development, testing robustness against fine-tuning attacks and linear probing.

Key Result

Current filtering practices are not particularly effective - excluded biological knowledge can be rapidly recovered via fine-tuning and dual-use signals already reside in pretrained representations accessible via simple linear probing.

Source