Open Problems in Machine Unlearning for AI Safety

Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, … (+13 more) — 2025-01-09 — University of Oxford, MIT, University of Cambridge, Nanyang Technological University, Tel Aviv University — arXiv

Summary

Identifies key limitations and open problems preventing machine unlearning from serving as a comprehensive AI safety solution, particularly for managing dual-use knowledge in CBRN and cybersecurity domains, and maps tensions between unlearning and existing safety mechanisms.

Source

Link: https://arxiv.org/abs/2501.04952
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-removal-unlearning — Black-box safety (understand and control current model behaviour) / Iterative alignment

capability-removal-unlearning

AI Safety Compendium

Explorer

Open Problems in Machine Unlearning for AI Safety

Open Problems in Machine Unlearning for AI Safety

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Open Problems in Machine Unlearning for AI Safety

Open Problems in Machine Unlearning for AI Safety

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents