A Closer Look at Machine Unlearning for Large Language Models
- URL: http://arxiv.org/abs/2410.08109v1
- Date: Thu, 10 Oct 2024 16:56:05 GMT
- Title: A Closer Look at Machine Unlearning for Large Language Models
- Authors: Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, Min Lin,
- Abstract summary: Large language models (LLMs) may memorize sensitive or copyrighted content, raising privacy and legal concerns.
We discuss several issues in machine unlearning for LLMs and provide our insights on possible approaches.
- Score: 46.245404272612795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) may memorize sensitive or copyrighted content, raising privacy and legal concerns. Due to the high cost of retraining from scratch, researchers attempt to employ machine unlearning to remove specific content from LLMs while preserving the overall performance. In this paper, we discuss several issues in machine unlearning for LLMs and provide our insights on possible approaches. To address the issue of inadequate evaluation of model outputs after unlearning, we introduce three additional metrics to evaluate token diversity, sentence semantics, and factual correctness. We then categorize unlearning methods into untargeted and targeted, and discuss their issues respectively. Specifically, the behavior that untargeted unlearning attempts to approximate is unpredictable and may involve hallucinations, and existing regularization is insufficient for targeted unlearning. To alleviate these issues, we propose using the objective of maximizing entropy (ME) for untargeted unlearning and incorporate answer preservation (AP) loss as regularization for targeted unlearning. Experimental results across three scenarios, i.e., fictitious unlearning, continual unlearning, and real-world unlearning, demonstrate the effectiveness of our approaches. The code is available at https://github.com/sail-sg/closer-look-LLM-unlearning.
Related papers
- Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge [36.524827594501495]
We show that applying quantization to models that have undergone unlearning can restore the "forgotten" information.
We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21% of the intended forgotten knowledge in full precision.
arXiv Detail & Related papers (2024-10-21T19:28:37Z) - Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning [38.03304773600225]
Large language models (LLMs) serve as giant information stores, often including personal or copyrighted data, and retraining them from scratch is not a viable option.
We propose MUNCH, a simple uncertainty-based approach that breaks down multi-hop queries into subquestions and leverages the uncertainty of the unlearned model in final decision-making.
arXiv Detail & Related papers (2024-10-17T07:00:15Z) - Practical Unlearning for Large Language Models [23.515444452866404]
Machine unlearning (MU) has emerged as a promising solution to address these issues.
MU typically assumes full access to the original training data to preserve utility.
Existing LLM unlearning methods often assume access to data most affected by undesired data unlearning.
We propose the O3 framework to overcome these challenges and achieve practical LLM unlearning.
arXiv Detail & Related papers (2024-07-14T14:26:17Z) - UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI [50.61495097098296]
We revisit the paradigm in which unlearning is used for Large Language Models (LLMs)
We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context.
We argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation.
arXiv Detail & Related papers (2024-06-27T10:24:35Z) - Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Large Language Model Unlearning via Embedding-Corrupted Prompts [10.889859281637406]
We present textbfEmbedding-COrrupted (ECO) Prompts, a lightweight unlearning framework for large language models.
We enforce an unlearned state during inference by employing a prompt classifier to identify and safeguard prompts to forget.
We find that these embedding-corrupted prompts not only lead to desirable outputs that satisfy the unlearning objective but also closely approximate the output from a model that has never been trained on the data intended for forgetting.
arXiv Detail & Related papers (2024-06-12T06:56:20Z) - Offset Unlearning for Large Language Models [49.851093293780615]
Unlearning has emerged as a potential remedy for Large Language Models affected by problematic training data.
We propose $delta$-unlearning, an offset unlearning framework for black-box LLMs.
Experiments demonstrate that $delta$-unlearning can effectively unlearn target data while maintaining similar or even stronger performance on general out-of-forget-scope tasks.
arXiv Detail & Related papers (2024-04-17T03:39:51Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Rethinking Machine Unlearning for Large Language Models [85.92660644100582]
We explore machine unlearning in the domain of large language models (LLMs)
This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities.
arXiv Detail & Related papers (2024-02-13T20:51:58Z) - Machine Unlearning in Large Language Models [8.14992136443131]
This paper introduces a novel machine unlearning framework into large language models.
Our objectives are to make LLMs not produce harmful, hallucinatory, or privacy-compromising responses.
Experimental results show that our approach effectively meets unlearning objectives without substantially compromising model performance.
arXiv Detail & Related papers (2024-02-03T05:14:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.