Unlearning Needs to be More Selective [Progress Report]

Filip Sondej, Yushi Yang, Marcel Windys — 2025-06-27 — LessWrong

Summary

Presents MUDMAN, a new machine unlearning method using Disruption Masking that allows only weight updates where unlearning and retain gradients align, achieving 40% improvement over state-of-the-art TAR method in robustness to fine-tuning attacks.

Key Result

MUDMAN outperforms the current state-of-the-art unlearning method (TAR) by 40% through combining selectivity (Disruption Masking), MAML meta-learning, and careful gradient-based updates.

Source