Related papers: CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept

CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept

URL: http://arxiv.org/abs/2410.10866v1
Date: Tue, 08 Oct 2024 10:26:22 GMT
Title: CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept
Authors: YuXuan Wu, Bonaventure F. P. Dossou, Dianbo Liu,
Abstract summary: We propose a novel amortized unlearning approach using codebook features and Sparse Autoencoders (SAEs) By leveraging a bottleneck to decompose the activation space and regulate information flow, our method efficiently unlearns targeted information while preserving the model's performance on unrelated data.
Score: 5.345828824625758
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) offer extensive knowledge across various domains, but they may inadvertently memorize sensitive, unauthorized, or malicious data, such as personal information in the medical and financial sectors. Machine unlearning methods aim to remove specific information from models after training to address this. However, current approaches require additional model training or struggle to effectively erase particular data points and their associated context due to LLMs' complex, dense, and continuous nature. In this study, we propose a novel amortized unlearning approach using codebook features and Sparse Autoencoders (SAEs). By leveraging a bottleneck to decompose the activation space and regulate information flow, our method efficiently unlearns targeted information while preserving the model's performance on unrelated data. To the best of our knowledge, this is the first work that successfully enables unlearning specific topics with contextual relevance in an LLM, marking a significant step towards real-world applications of machine unlearning.

Related papers

Forgetting-MarI: LLM Unlearning via Marginal Information Regularization [6.979586479353831]
Existing unlearning methods often degrade model performance by removing more information than necessary when attempting to ''forget'' specific data.<n>We introduce Forgetting-MarI, an LLM unlearning framework that provably removes only the additional (marginal) information contributed by the data to be unlearned.<n>By penalizing marginal information, our method yields an explicit upper bound on the unlearn dataset's residual influence in the trained models, providing provable undetectability.
arXiv Detail & Related papers (2025-11-14T22:48:39Z)
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning [50.45435841411193]
Code Language Models (CLMs) exhibit unintended memorization of sensitive training data, enabling verbatim reproduction of confidential information when specifically prompted.<n>We introduce CodeEraser, an advanced variant that selectively unlearns sensitive memorized segments in code while preserving the structural integrity and functional correctness of the surrounding code.
arXiv Detail & Related papers (2025-09-17T07:12:35Z)
RESTOR: Knowledge Recovery through Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can memorize undesirable datapoints. Many machine unlearning methods have been proposed that aim to 'erase' these datapoints from trained models. We propose the RESTOR framework for machine unlearning based on the following dimensions.
arXiv Detail & Related papers (2024-10-31T20:54:35Z)
Catastrophic Failure of LLM Unlearning via Quantization [36.524827594501495]
We show that applying quantization to models that have undergone unlearning can restore the "forgotten" information. We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21% of the intended forgotten knowledge in full precision.
arXiv Detail & Related papers (2024-10-21T19:28:37Z)
A Closer Look at Machine Unlearning for Large Language Models [46.245404272612795]
Large language models (LLMs) may memorize sensitive or copyrighted content, raising privacy and legal concerns. We discuss several issues in machine unlearning for LLMs and provide our insights on possible approaches.
arXiv Detail & Related papers (2024-10-10T16:56:05Z)
Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [49.043599241803825]
Iterative Contrastive Unlearning (ICU) framework consists of three core components. A Knowledge Unlearning Induction module removes specific knowledge through an unlearning loss. A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal. And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update.
arXiv Detail & Related papers (2024-07-25T07:09:35Z)
Offset Unlearning for Large Language Models [49.851093293780615]
delta-Unlearning is an offset unlearning framework for black-box LLMs.<n>We show that delta-Unlearning can effectively unlearn target data while maintaining similar or even stronger performance on general out-of-forget-scope tasks.
arXiv Detail & Related papers (2024-04-17T03:39:51Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
Rethinking Machine Unlearning for Large Language Models [85.92660644100582]
We explore machine unlearning in the domain of large language models (LLMs) This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities.
arXiv Detail & Related papers (2024-02-13T20:51:58Z)
Unlearnable Algorithms for In-context Learning [36.895152458323764]
In this paper, we focus on efficient unlearning methods for the task adaptation phase of a pretrained large language model. We observe that an LLM's ability to do in-context learning for task adaptation allows for efficient exact unlearning of task adaptation training data. We propose a new holistic measure of unlearning cost which accounts for varying inference costs.
arXiv Detail & Related papers (2024-02-01T16:43:04Z)
TOFU: A Task of Fictitious Unlearning for LLMs [99.92305790945507]
Large language models trained on massive corpora of data from the web can reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. We present TOFU, a benchmark aimed at helping deepen our understanding of unlearning.
arXiv Detail & Related papers (2024-01-11T18:57:12Z)
Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data. This process might suffer from privacy issues and violations of data protection regulations. We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z)
Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.