Not All Tokens Are Meant to Be Forgotten
- URL: http://arxiv.org/abs/2506.03142v1
- Date: Tue, 03 Jun 2025 17:59:05 GMT
- Title: Not All Tokens Are Meant to Be Forgotten
- Authors: Xiangyu Zhou, Yao Qiang, Saleh Zare Zade, Douglas Zytko, Prashant Khanduri, Dongxiao Zhu,
- Abstract summary: Large Language Models (LLMs) exhibit remarkable human-level language understanding, reasoning, and decision-making abilities.<n>LLMs tend to memorize unwanted information, such as private or copyrighted content, raising significant privacy and legal concerns.
- Score: 13.060635265281864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs), pre-trained on massive text corpora, exhibit remarkable human-level language understanding, reasoning, and decision-making abilities. However, they tend to memorize unwanted information, such as private or copyrighted content, raising significant privacy and legal concerns. Unlearning has emerged as a promising solution, but existing methods face a significant challenge of over-forgetting. This issue arises because they indiscriminately suppress the generation of all the tokens in forget samples, leading to a substantial loss of model utility. To overcome this challenge, we introduce the Targeted Information Forgetting (TIF) framework, which consists of (1) a flexible targeted information identifier designed to differentiate between unwanted words (UW) and general words (GW) in the forget samples, and (2) a novel Targeted Preference Optimization approach that leverages Logit Preference Loss to unlearn unwanted information associated with UW and Preservation Loss to retain general information in GW, effectively improving the unlearning process while mitigating utility degradation. Extensive experiments on the TOFU and MUSE benchmarks demonstrate that the proposed TIF framework enhances unlearning effectiveness while preserving model utility and achieving state-of-the-art results.
Related papers
- Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning [53.398270878295754]
Supervised fine-tuning (SFT) plays a critical role for pretrained large language models (LLMs)<n>We suggest categorizing tokens within each corpus into two parts -- positive and negative tokens -- based on whether they are useful to improve model performance.<n>We conduct experiments on well-established benchmarks, finding that this forgetting mechanism not only improves overall model performance and also facilitate more diverse model responses.
arXiv Detail & Related papers (2025-08-06T11:22:23Z) - LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning [33.62466543549043]
Loss-based Reweighting Unlearning (LoReUn) is a plug-and-play strategy that dynamically reweights data during the unlearning process with minimal additional computational overhead.<n>Our approach significantly reduces the gap between existing MU methods and exact unlearning in both image classification and generation tasks.
arXiv Detail & Related papers (2025-07-30T09:12:25Z) - Rethinking Post-Unlearning Behavior of Large Vision-Language Models [17.951441278605966]
We introduce a new unlearning task for Large Vision-Language Models (LVLMs)<n>This task requires models to provide privacy-preserving yet informative and visually grounded responses.<n>We also propose, a novel unlearning method that explicitly guides post-unlearning behavior toward a desirable output distribution.
arXiv Detail & Related papers (2025-06-03T07:28:22Z) - Not Every Token Needs Forgetting: Selective Unlearning to Limit Change in Utility in Large Language Model Unlearning [95.53571199301963]
Conventional unlearning approaches indiscriminately update model parameters to forget all tokens in a target document.<n>We propose Selective Unlearning (SU), which identifies a critical subset of tokens within the forgetting set that is relevant to the unwanted information.<n>Experiments on two benchmarks and six baseline unlearning algorithms demonstrate that SU not only achieves effective unlearning on the targeted forget data, but also significantly preserves the model's utility in the retaining set.
arXiv Detail & Related papers (2025-06-01T07:36:45Z) - Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy [36.19634262653306]
This paper reveals a critical vulnerability in fine-tuning-based unlearning.<n>A malicious user can craft a manipulated forgetting request that stealthily degrades the model's utility for benign users.<n>We propose Scope-aware Unlearning (SU), a lightweight enhancement that introduces a scope term into the unlearning objective.
arXiv Detail & Related papers (2025-05-31T02:57:24Z) - Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models [70.78205685001168]
We investigate knowledge forgetting in large language models with a focus on its generalisation.<n> UGBench is the first benchmark specifically designed to assess the unlearning of in-scope implicit knowledge.<n>We propose PerMU, a novel probability-based unlearning paradigm.
arXiv Detail & Related papers (2025-02-27T11:03:33Z) - Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice [186.055899073629]
Unlearning is often invoked as a solution for removing the effects of targeted information from a generative-AI model.<n>Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs.<n>Both of these goals--the targeted removal of information from a model and the targeted suppression of information from a model's outputs--present various technical and substantive challenges.
arXiv Detail & Related papers (2024-12-09T20:18:43Z) - Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs [25.91643745340183]
Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora.<n>This poses risk of privacy and copyright violations, highlighting the need for efficient machine unlearning methods.<n>We propose Low-rank Knowledge Unlearning (LoKU), a novel framework that enables robust and efficient unlearning for LLMs.
arXiv Detail & Related papers (2024-08-13T04:18:32Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [52.03511469562013]
We introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components.<n>A Knowledge Unlearning Induction module targets specific knowledge for removal using an unlearning loss.<n>A Contrastive Learning Enhancement module preserves the model's expressive capabilities against the pure unlearning goal.<n>An Iterative Unlearning Refinement module dynamically adjusts the unlearning process through ongoing evaluation and updates.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - TOFU: A Task of Fictitious Unlearning for LLMs [99.92305790945507]
Large language models trained on massive corpora of data from the web can reproduce sensitive or private data raising both legal and ethical concerns.
Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training.
We present TOFU, a benchmark aimed at helping deepen our understanding of unlearning.
arXiv Detail & Related papers (2024-01-11T18:57:12Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.