UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets
- URL: http://arxiv.org/abs/2503.04693v1
- Date: Thu, 06 Mar 2025 18:40:00 GMT
- Title: UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets
- Authors: Wenyu Wang, Mengqi Zhang, Xiaotian Ye, Zhaochun Ren, Zhumin Chen, Pengjie Ren,
- Abstract summary: Large Language Models (LLMs) inevitably acquire harmful information during training on massive datasets.<n>Existing unlearning methods focus on forgetting target data while overlooking the crucial impact of logically related knowledge on the effectiveness of unlearning.<n>We propose Unlearning Improvement via Extrapolation (UIPE), a method that removes knowledge highly correlated with the forgetting targets.
- Score: 41.0340052199534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) inevitably acquire harmful information during training on massive datasets. LLM unlearning aims to eliminate the influence of such harmful information while maintaining the model's overall performance. Existing unlearning methods, represented by gradient ascent-based approaches, primarily focus on forgetting target data while overlooking the crucial impact of logically related knowledge on the effectiveness of unlearning. In this paper, through both theoretical and experimental analyses, we first demonstrate that a key reason for the suboptimal unlearning performance is that models can reconstruct the target content through reasoning with logically related knowledge. To address this issue, we propose Unlearning Improvement via Parameter Extrapolation (UIPE), a method that removes knowledge highly correlated with the forgetting targets. Experimental results show that UIPE significantly enhances the performance of various mainstream LLM unlearning methods on the TOFU benchmark.
Related papers
- Effective LLM Knowledge Learning via Model Generalization [73.16975077770765]
Large language models (LLMs) are trained on enormous documents that contain extensive world knowledge.
It is still not well-understood how knowledge is acquired via autoregressive pre-training.
In this paper, we focus on understanding and improving LLM knowledge learning.
arXiv Detail & Related papers (2025-03-05T17:56:20Z) - Multi-Objective Large Language Model Unlearning [3.372396620898397]
Gradient Ascent (GA) is a proactive way to decrease the prediction probability of the model on the target data.<n>We propose Multi-Objective Large Language Model Unlearning (MOLLM) algorithm to overcome gradient explosion and catastrophic forgetting.<n>Our empirical results verify that MoLLM outperforms the SOTA GA-based LLM unlearning methods in terms of unlearning effect and model utility preservation.
arXiv Detail & Related papers (2024-12-29T09:35:56Z) - Streamlined Federated Unlearning: Unite as One to Be Highly Efficient [12.467630082668254]
"Right to be forgotten" laws and regulations has imposed new privacy requirements on federated learning (FL)<n>We propose a streamlined federated unlearning approach (SFU) aimed at effectively removing the influence of target data while preserving the model's performance on retained data without degradation.
arXiv Detail & Related papers (2024-11-28T12:52:48Z) - KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [73.34893326181046]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner.<n>We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases.<n>Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods [1.9799527196428242]
Large language model unlearning aims to remove harmful information that LLMs have learnt to prevent their use for malicious purposes.<n>We show that unlearning has a notable impact on general model capabilities.<n>We show that doing 5-shot prompting or rephrasing the question in simple ways can lead to an over ten-fold increase in accuracy on unlearning benchmarks.
arXiv Detail & Related papers (2024-11-18T22:31:17Z) - UNLEARN Efficient Removal of Knowledge in Large Language Models [1.9797215742507548]
This paper proposes a novel method to achieve this objective called UNLEARN.
The approach builds upon subspace methods to identify and specifically target the removal of knowledge without adversely affecting other knowledge in the LLM.
Results demonstrate 96% of targeted knowledge can be forgotten while maintaining performance on other knowledge within 2.5% of the original model.
arXiv Detail & Related papers (2024-08-08T00:53:31Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [52.03511469562013]
We introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components.<n>A Knowledge Unlearning Induction module targets specific knowledge for removal using an unlearning loss.<n>A Contrastive Learning Enhancement module preserves the model's expressive capabilities against the pure unlearning goal.<n>An Iterative Unlearning Refinement module dynamically adjusts the unlearning process through ongoing evaluation and updates.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - OAL: Enhancing OOD Detection Using Latent Diffusion [5.357756138014614]
Outlier Aware Learning (OAL) framework synthesizes OOD training data directly in the latent space.
We introduce a mutual information-based contrastive learning approach that amplifies the distinction between In-Distribution (ID) and collected OOD features.
arXiv Detail & Related papers (2024-06-24T11:01:43Z) - Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models.<n>It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning [30.831866499812925]
We propose a semantic perspective to investigate the reasons behind PEFT's limitations in knowledge learning task.
PEFT presents a notable risk of pushing the model away from the intended knowledge target.
We introduce a data filtering strategy to exclude data that is detrimental to knowledge learning and a re-weighted learning strategy to make the model attentive to semantic distance.
arXiv Detail & Related papers (2024-05-28T15:47:11Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.