Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Boyi Wei, Zora Che, Nathaniel Li, Udari Madhushani Sehwag, Jasper Götting, Samira Nedungadi, … (+7 more) — 2025-10-31 — Anthropic, Redwood Research — arXiv

Summary

Proposes BioRiskEval, a framework to evaluate whether data filtering procedures effectively prevent bio-foundation models from enabling bioweapon development, testing robustness against fine-tuning attacks and linear probing.

Key Result

Current filtering practices are not particularly effective - excluded biological knowledge can be rapidly recovered via fine-tuning and dual-use signals already reside in pretrained representations accessible via simple linear probing.

Source

Link: https://arxiv.org/abs/2510.27629
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- wmd-evals-weapons-of-mass-destruction — Evals

wmd-evals-weapons-of-mass-destruction

AI Safety Compendium

Explorer

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents