Open Problems in Machine Unlearning for AI Safety
Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, … (+13 more) — 2025-01-09 — University of Oxford, MIT, University of Cambridge, Nanyang Technological University, Tel Aviv University — arXiv
Summary
Identifies key limitations and open problems preventing machine unlearning from serving as a comprehensive AI safety solution, particularly for managing dual-use knowledge in CBRN and cybersecurity domains, and maps tensions between unlearning and existing safety mechanisms.
Source
- Link: https://arxiv.org/abs/2501.04952
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-removal-unlearning — Black-box safety (understand and control current model behaviour) / Iterative alignment