Reveal and Release: Iterative LLM Unlearning with Self-generated Data
- URL: http://arxiv.org/abs/2509.14624v1
- Date: Thu, 18 Sep 2025 05:07:27 GMT
- Title: Reveal and Release: Iterative LLM Unlearning with Self-generated Data
- Authors: Linxi Xie, Xin Teng, Shichang Ke, Hongyi Wen, Shengjie Wang,
- Abstract summary: We propose a Reveal-and-Release'' method to unlearn with self-generated data.<n>We make incremental adjustments to the model's weight space with parameter-efficient modules trained on the forget data.
- Score: 5.932877449308903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language model (LLM) unlearning has demonstrated effectiveness in removing the influence of undesirable data (also known as forget data). Existing approaches typically assume full access to the forget dataset, overlooking two key challenges: (1) Forget data is often privacy-sensitive, rare, or legally regulated, making it expensive or impractical to obtain (2) The distribution of available forget data may not align with how that information is represented within the model. To address these limitations, we propose a ``Reveal-and-Release'' method to unlearn with self-generated data, where we prompt the model to reveal what it knows using optimized instructions. To fully utilize the self-generated forget data, we propose an iterative unlearning framework, where we make incremental adjustments to the model's weight space with parameter-efficient modules trained on the forget data. Experimental results demonstrate that our method balances the tradeoff between forget quality and utility preservation.
Related papers
- Representation Unlearning: Forgetting through Information Compression [3.9189279162842854]
We introduce Representation Unlearning, a framework that performs unlearning directly in the model's representation space.<n>We show that Representation Unlearning achieves more reliable forgetting, better utility retention, and greater computational efficiency than parameter-centric baselines.
arXiv Detail & Related papers (2026-01-29T11:28:02Z) - Rendering Data Unlearnable by Exploiting LLM Alignment Mechanisms [3.648393062009244]
Large language models (LLMs) are increasingly trained on massive, heterogeneous text corpora.<n>This raises serious concerns about the unauthorised use of proprietary or personal data during model training.<n>We propose Disclaimer Injection, a novel data-level defence that renders text unlearnable to LLMs.
arXiv Detail & Related papers (2026-01-06T20:34:15Z) - Forgetting-MarI: LLM Unlearning via Marginal Information Regularization [6.979586479353831]
Existing unlearning methods often degrade model performance by removing more information than necessary when attempting to ''forget'' specific data.<n>We introduce Forgetting-MarI, an LLM unlearning framework that provably removes only the additional (marginal) information contributed by the data to be unlearned.<n>By penalizing marginal information, our method yields an explicit upper bound on the unlearn dataset's residual influence in the trained models, providing provable undetectability.
arXiv Detail & Related papers (2025-11-14T22:48:39Z) - Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs [54.167494079321465]
Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data.<n>We propose a novel unlearning method-Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective.
arXiv Detail & Related papers (2025-07-06T03:08:49Z) - FUNU: Boosting Machine Unlearning Efficiency by Filtering Unnecessary Unlearning [9.472692023087223]
We propose FUNU, a method to identify data points that lead to unnecessary unlearning.<n>We provide a theoretical analysis of FUNU and conduct extensive experiments to validate its efficacy.
arXiv Detail & Related papers (2025-01-28T01:19:07Z) - LLM Unlearning via Loss Adjustment with Only Forget Data [20.310423152885217]
We introduce Forget data only Loss AjustmenT (FLAT), a "flat" loss adjustment approach which addresses these issues.
Empirical results demonstrate that our approach achieves superior unlearning performance compared to existing methods.
arXiv Detail & Related papers (2024-10-14T23:43:33Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - TOFU: A Task of Fictitious Unlearning for LLMs [99.92305790945507]
Large language models trained on massive corpora of data from the web can reproduce sensitive or private data raising both legal and ethical concerns.
Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training.
We present TOFU, a benchmark aimed at helping deepen our understanding of unlearning.
arXiv Detail & Related papers (2024-01-11T18:57:12Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z) - Class-wise Federated Unlearning: Harnessing Active Forgetting with Teacher-Student Memory Generation [11.638683787598817]
We propose a neuro-inspired federated unlearning framework based on active forgetting.<n>Our framework distinguishes itself from existing methods by utilizing new memories to overwrite old ones.<n>Our method achieves satisfactory unlearning completeness against backdoor attacks.
arXiv Detail & Related papers (2023-07-07T03:07:26Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.