Unlearning Needs to be More Selective [Progress Report]
Filip Sondej, Yushi Yang, Marcel Windys — 2025-06-27 — LessWrong
Summary
Presents MUDMAN, a new machine unlearning method using Disruption Masking that allows only weight updates where unlearning and retain gradients align, achieving 40% improvement over state-of-the-art TAR method in robustness to fine-tuning attacks.
Key Result
MUDMAN outperforms the current state-of-the-art unlearning method (TAR) by 40% through combining selectivity (Disruption Masking), MAML meta-learning, and careful gradient-based updates.
Source
- Link: https://lesswrong.com/posts/QYzofMbzmbgiwfqy8/unlearning-needs-to-be-more-selective-progress-report
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-removal-unlearning — Black-box safety (understand and control current model behaviour) / Iterative alignment
- Editorial blurb (verbatim):
[Unlearning Needs to be More Selective \[Progress Report\]](https://lesswrong.com/posts/QYzofMbzmbgiwfqy8/unlearning-needs-to-be-more-selective-progress-report)