Related papers: Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models

Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models

URL: http://arxiv.org/abs/2407.20271v2
Date: Wed, 9 Oct 2024 14:30:08 GMT
Title: Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
Authors: Haoyu Tang, Ye Liu, Xukai Liu, Kai Zhang, Yanghai Zhang, Qi Liu, Enhong Chen,
Abstract summary: Iterative Contrastive Unlearning (ICU) framework consists of three core components. A Knowledge Unlearning Induction module removes specific knowledge through an unlearning loss. A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal. And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update.
Score: 49.043599241803825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in machine learning, particularly in Natural Language Processing (NLP), have led to the development of sophisticated models trained on extensive datasets, yet raising concerns about the potential leakage of sensitive information. In response, regulatory measures such as the European Union's General Data Protection Regulation (GDPR) have driven increasing interest in Machine Unlearning techniques, which enable models to selectively forget specific data entries. Early approaches primarily relied on pre-processing methods, while more recent research has shifted towards training-based unlearning techniques. Despite their effectiveness, most existing methods require access to the original training data, which is often inaccessible. Additionally, directly applying unlearning techniques bear the cost of undermining the model's expressive capabilities. To address these challenges, we introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components: A Knowledge Unlearning Induction module designed to remove specific knowledge through an unlearning loss; A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal; And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update. Experimental results demonstrate the efficacy of our ICU method in unlearning sensitive information while maintaining the model's overall performance, offering a promising solution for privacy-conscious machine learning applications.

Related papers

Forgetting-MarI: LLM Unlearning via Marginal Information Regularization [6.979586479353831]
Existing unlearning methods often degrade model performance by removing more information than necessary when attempting to ''forget'' specific data.<n>We introduce Forgetting-MarI, an LLM unlearning framework that provably removes only the additional (marginal) information contributed by the data to be unlearned.<n>By penalizing marginal information, our method yields an explicit upper bound on the unlearn dataset's residual influence in the trained models, providing provable undetectability.
arXiv Detail & Related papers (2025-11-14T22:48:39Z)
SIMU: Selective Influence Machine Unlearning [3.4306454949547214]
Un undesired memorization of sensitive information by Large Language Models (LLMs) has emphasized the need for safety mechanisms.<n>First-order and second-order-based methods have shown significant progress in enabling LLMs to forget targeted information.<n>We propose a two-step framework that enhances second-order-based unlearning by updating only the critical neurons responsible for encoding the forget-set.
arXiv Detail & Related papers (2025-10-09T06:03:15Z)
Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z)
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs [58.24692529185971]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z)
UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models [54.75551043657238]
We introduce UniErase, a novel unlearning paradigm that employs learnable parametric suffix (unlearning token) to steer language models toward targeted forgetting behaviors.<n>UniErase achieves state-of-the-art (SOTA) performance across batch, sequential, and precise unlearning under fictitious and real-world knowledge settings.
arXiv Detail & Related papers (2025-05-21T15:53:28Z)
Privacy-Aware Lifelong Learning [14.83033354320841]
The field of machine unlearning focuses on explicitly forgetting certain previous knowledge from pretrained models when requested.<n>We propose a solution, privacy-aware lifelong learning (PALL), involving optimization of task-specific sparseworks with parameter sharing within a single architecture.<n>We empirically demonstrate the scalability of PALL across various architectures in image classification, and provide a state-of-the-art solution.
arXiv Detail & Related papers (2025-05-16T07:27:00Z)
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research [186.53450963176968]
"Machine unlearning" is a proposed solution for mitigating the existence of content in an AI model that is problematic for legal or moral reasons.<n>We provide a framework for ML researchers and policymakers to think rigorously about these challenges.
arXiv Detail & Related papers (2024-12-09T20:18:43Z)
Zero-shot Class Unlearning via Layer-wise Relevance Analysis and Neuronal Path Perturbation [11.174705227990241]
Machine unlearning is a technique that removes specific data influences from trained models without the need for extensive retraining. This paper presents a novel approach to machine unlearning by employing Layer-wise Relevance Analysis and Neuronal Path Perturbation. Our method balances machine unlearning performance and model utility by identifying and perturbing highly relevant neurons, thereby achieving effective unlearning.
arXiv Detail & Related papers (2024-10-31T07:37:04Z)
CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept [5.345828824625758]
We propose a novel amortized unlearning approach using codebook features and Sparse Autoencoders (SAEs) By leveraging a bottleneck to decompose the activation space and regulate information flow, our method efficiently unlearns targeted information while preserving the model's performance on unrelated data.
arXiv Detail & Related papers (2024-10-08T10:26:22Z)
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability. Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead. We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z)
Silver Linings in the Shadows: Harnessing Membership Inference for Machine Unlearning [7.557226714828334]
We present a novel unlearning mechanism designed to remove the impact of specific data samples from a neural network. In achieving this goal, we crafted a novel loss function tailored to eliminate privacy-sensitive information from weights and activation values of the target model. Our results showcase the superior performance of our approach in terms of unlearning efficacy and latency as well as the fidelity of the primary task.
arXiv Detail & Related papers (2024-07-01T00:20:26Z)
Federated Learning driven Large Language Models for Swarm Intelligence: A Survey [2.769238399659845]
Federated learning (FL) offers a compelling framework for training large language models (LLMs) We focus on machine unlearning, a crucial aspect for complying with privacy regulations like the Right to be Forgotten. We explore various strategies that enable effective unlearning, such as perturbation techniques, model decomposition, and incremental learning.
arXiv Detail & Related papers (2024-06-14T08:40:58Z)
Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA) Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning. We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
Efficient Knowledge Deletion from Trained Models through Layer-wise Partial Machine Unlearning [2.3496568239538083]
This paper introduces a novel class of machine unlearning algorithms. First method is partial amnesiac unlearning, integration of layer-wise pruning with amnesiac unlearning. Second method assimilates layer-wise partial-updates into label-flipping and optimization-based unlearning.
arXiv Detail & Related papers (2024-03-12T12:49:47Z)
UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models [12.45822383965784]
We introduce UnDIAL (Unlearning via Self-Distillation on Adjusted Logits), a novel and robust unlearning method. Our approach leverages self-distillation to adjust logits and selectively reduce the influence of targeted tokens.
arXiv Detail & Related papers (2024-02-15T16:21:14Z)
Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data. This process might suffer from privacy issues and violations of data protection regulations. We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z)
PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT. On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt. On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z)
Transfer Learning without Knowing: Reprogramming Black-box Machine Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model. Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses. BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.