Unlearning Isn’t Deletion: Investigating Reversibility of Machine Unlearning in LLMs

Xiaoyu Xu, Xiang Yue, Yang Liu, Qingqing Ye, Huadi Zheng, Peizhao Hu, … (+2 more) — 2025-05-22 — arXiv

Summary

Introduces a representation-level analysis framework to evaluate machine unlearning in LLMs, demonstrating that standard metrics are misleading as unlearned information can be easily restored through minimal fine-tuning, revealing that current unlearning methods merely suppress rather than genuinely erase information.

Key Result

Across six unlearning methods, three data domains, and two LLMs, information appears to be suppressed rather than erased, as original behavior is easily restored through minimal fine-tuning, identifying four distinct forgetting regimes based on reversibility and catastrophicity.

Source