Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models
Boyi Wei, Zora Che, Nathaniel Li, Udari Madhushani Sehwag, Jasper Götting, Samira Nedungadi, … (+7 more) — 2025-10-31 — Anthropic, Redwood Research — arXiv
Summary
Proposes BioRiskEval, a framework to evaluate whether data filtering procedures effectively prevent bio-foundation models from enabling bioweapon development, testing robustness against fine-tuning attacks and linear probing.
Key Result
Current filtering practices are not particularly effective - excluded biological knowledge can be rapidly recovered via fine-tuning and dual-use signals already reside in pretrained representations accessible via simple linear probing.
Source
- Link: https://arxiv.org/abs/2510.27629
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):