UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models
- URL: http://arxiv.org/abs/2505.15674v1
- Date: Wed, 21 May 2025 15:53:28 GMT
- Title: UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models
- Authors: Miao Yu, Liang Lin, Guibin Zhang, Xinfeng Li, Junfeng Fang, Ningyu Zhang, Kun Wang, Yang Wang,
- Abstract summary: We introduce UniErase, a novel unlearning paradigm that employs learnable parametric suffix (unlearning token) to steer language models toward targeted forgetting behaviors.<n>UniErase achieves state-of-the-art (SOTA) performance across batch, sequential, and precise unlearning under fictitious and real-world knowledge settings.
- Score: 54.75551043657238
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models require iterative updates to address challenges such as knowledge conflicts and outdated information (e.g., incorrect, private, or illegal contents). Machine unlearning provides a systematic methodology for targeted knowledge removal from trained models, enabling elimination of sensitive information influences. However, mainstream fine-tuning-based unlearning methods often fail to balance unlearning efficacy and model ability, frequently resulting in catastrophic model collapse under extensive knowledge removal. Meanwhile, in-context unlearning, which relies solely on contextual prompting without modifying the model's intrinsic mechanisms, suffers from limited generalizability and struggles to achieve true unlearning. In this work, we introduce UniErase, a novel unlearning paradigm that employs learnable parametric suffix (unlearning token) to steer language models toward targeted forgetting behaviors. UniErase operates through two key phases: (I) an optimization stage that binds desired unlearning outputs to the model's autoregressive probability distribution via token optimization, followed by (II) a lightweight model editing phase that activates the learned token to probabilistically induce specified forgetting objective. Serving as a new research direction for token learning to induce unlearning target, UniErase achieves state-of-the-art (SOTA) performance across batch, sequential, and precise unlearning under fictitious and real-world knowledge settings. Remarkably, in terms of TOFU benchmark, UniErase, modifying only around 3.66% of the LLM parameters, outperforms previous forgetting SOTA baseline by around 4.01 times for model ability with even better unlearning efficacy. Similarly, UniErase, maintaining more ability, also surpasses previous retaining SOTA by 35.96% for unlearning efficacy, showing dual top-tier performances in current unlearing domain.
Related papers
- Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning [53.398270878295754]
Supervised fine-tuning (SFT) plays a critical role for pretrained large language models (LLMs)<n>We suggest categorizing tokens within each corpus into two parts -- positive and negative tokens -- based on whether they are useful to improve model performance.<n>We conduct experiments on well-established benchmarks, finding that this forgetting mechanism not only improves overall model performance and also facilitate more diverse model responses.
arXiv Detail & Related papers (2025-08-06T11:22:23Z) - Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z) - Unified Parameter-Efficient Unlearning for LLMs [25.195126838721492]
Large Language Models (LLMs) have revolutionized natural language processing, enabling advanced understanding and reasoning capabilities across a variety of tasks.<n>This raises significant privacy and security concerns, as models may inadvertently retain and disseminate sensitive or undesirable information.<n>We introduce a novel instance-wise unlearning framework, LLMEraser, which systematically categorizes unlearning tasks and applies precise adjustments using influence functions.
arXiv Detail & Related papers (2024-11-30T07:21:02Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [52.03511469562013]
We introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components.<n>A Knowledge Unlearning Induction module targets specific knowledge for removal using an unlearning loss.<n>A Contrastive Learning Enhancement module preserves the model's expressive capabilities against the pure unlearning goal.<n>An Iterative Unlearning Refinement module dynamically adjusts the unlearning process through ongoing evaluation and updates.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - Learning to Unlearn for Robust Machine Unlearning [6.488418950340473]
We introduce a novel Learning-to-Unlearn (LTU) framework to optimize the unlearning process.
LTU includes a meta-optimization scheme that facilitates models to effectively preserve generalizable knowledge.
We also introduce a Gradient Harmonization strategy to align the optimization trajectories for remembering and forgetting.
arXiv Detail & Related papers (2024-07-15T07:36:00Z) - Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models.<n>It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models [12.45822383965784]
We introduce UnDIAL (Unlearning via Self-Distillation on Adjusted Logits), a novel and robust unlearning method.
Our approach leverages self-distillation to adjust logits and selectively reduce the influence of targeted tokens.
arXiv Detail & Related papers (2024-02-15T16:21:14Z) - Model Sparsity Can Simplify Machine Unlearning [33.18951938708467]
In response to recent data regulation requirements, machine unlearning (MU) has emerged as a critical process.
Our study introduces a novel model-based perspective: model sparsification via weight pruning.
We show in both theory and practice that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner.
arXiv Detail & Related papers (2023-04-11T02:12:02Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.