Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models
- URL: http://arxiv.org/abs/2502.19982v2
- Date: Tue, 20 May 2025 08:18:15 GMT
- Title: Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models
- Authors: Huazheng Wang, Yongcheng Jing, Haifeng Sun, Yingjie Wang, Jingyu Wang, Jianxin Liao, Dacheng Tao,
- Abstract summary: We investigate knowledge forgetting in large language models with a focus on its generalisation.<n> UGBench is the first benchmark specifically designed to assess the unlearning of in-scope implicit knowledge.<n>We propose PerMU, a novel probability-based unlearning paradigm.
- Score: 70.78205685001168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate knowledge forgetting in large language models with a focus on its generalisation--ensuring that models forget not only specific training samples but also related implicit knowledge. To this end, we begin by identifying a broader unlearning scope that includes both target data and logically associated samples, including rephrased, subject-replaced, one-hop reasoned, and relation-reversed data. To rigorously evaluate generalisation, we introduce UGBench, the first comprehensive benchmark specifically designed to assess the unlearning of in-scope implicit knowledge covering 13 state-of-the-art methods across three datasets. UGBench reveals that unlearned models can still recall paraphrased answers and retain target facts in intermediate layers. This motivates us to take a preliminary step toward more generalised implicit knowledge forgetting by proposing PerMU, a novel probability perturbation-based unlearning paradigm. PerMU simulates adversarial unlearning samples to eliminate fact-related tokens from the logit distribution, collectively reducing the probabilities of all answer-associated tokens. Experiments are conducted on a diverse range of datasets, including TOFU, Harry Potter, ZsRE, WMDP, and MUSE, using models ranging from 1.3B to 13B in scale. The results demonstrate that PerMU delivers up to a 50.40% improvement in unlearning vanilla target data while maintaining a 40.73% boost in forgetting implicit knowledge. Our code can be found in https://github.com/MaybeLizzy/UGBench.
Related papers
- Align-then-Unlearn: Embedding Alignment for LLM Unlearning [41.94295877935867]
Unlearning seeks to selectively remove specific data from trained models, such as personal information or copyrighted content.<n>We propose Align-then-Unlearn, a novel framework that performs unlearning in the semantic embedding space.
arXiv Detail & Related papers (2025-06-16T07:48:01Z) - ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning.<n>This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation.<n>Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z) - Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods [1.9799527196428242]
Large language model unlearning aims to remove harmful information that LLMs have learnt to prevent their use for malicious purposes.<n>We show that unlearning has a notable impact on general model capabilities.<n>We show that doing 5-shot prompting or rephrasing the question in simple ways can lead to an over ten-fold increase in accuracy on unlearning benchmarks.
arXiv Detail & Related papers (2024-11-18T22:31:17Z) - Catastrophic Failure of LLM Unlearning via Quantization [36.524827594501495]
We show that applying quantization to models that have undergone unlearning can restore the "forgotten" information.<n>We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21% of the intended forgotten knowledge in full precision.
arXiv Detail & Related papers (2024-10-21T19:28:37Z) - Mitigating Memorization In Language Models [37.899013074095336]
Language models (LMs) can "memorize" information, encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data.
We introduce TinyMem, a suite of small, computationally-efficient LMs for the rapid development and evaluation of memorization-mitigation methods.
We show, in particular, that our proposed unlearning method BalancedSubnet outperforms other mitigation methods at removing memorized information while preserving performance on target tasks.
arXiv Detail & Related papers (2024-10-03T02:53:51Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [52.03511469562013]
We introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components.<n>A Knowledge Unlearning Induction module targets specific knowledge for removal using an unlearning loss.<n>A Contrastive Learning Enhancement module preserves the model's expressive capabilities against the pure unlearning goal.<n>An Iterative Unlearning Refinement module dynamically adjusts the unlearning process through ongoing evaluation and updates.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - MUSE: Machine Unlearning Six-Way Evaluation for Language Models [109.76505405962783]
Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content.
We propose MUSE, a comprehensive machine unlearning evaluation benchmark.
We benchmark how effectively eight popular unlearning algorithms can unlearn Harry Potter books and news articles.
arXiv Detail & Related papers (2024-07-08T23:47:29Z) - To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models [39.39428450239399]
Large Language Models (LLMs) trained on extensive corpora inevitably retain sensitive data, such as personal privacy information and copyrighted material.
Recent advancements in knowledge unlearning involve updating LLM parameters to erase specific knowledge.
We introduce KnowUnDo to evaluate if the unlearning process inadvertently erases essential knowledge.
arXiv Detail & Related papers (2024-07-02T03:34:16Z) - Large Language Model Unlearning via Embedding-Corrupted Prompts [10.889859281637406]
We present textbfEmbedding-COrrupted (ECO) Prompts, a lightweight unlearning framework for large language models.
We enforce an unlearned state during inference by employing a prompt classifier to identify and safeguard prompts to forget.
We find that these embedding-corrupted prompts not only lead to desirable outputs that satisfy the unlearning objective but also closely approximate the output from a model that has never been trained on the data intended for forgetting.
arXiv Detail & Related papers (2024-06-12T06:56:20Z) - Simple linear attention language models balance the recall-throughput tradeoff [60.06020449520365]
We propose BASED, a simple architecture combining linear and sliding window attention.
We train language models up to 1.3b parameters and show that BASED matches the strongest sub-quadratic models in perplexity and outperforms them on real-world recall-intensive tasks by 6.22 accuracy points.
arXiv Detail & Related papers (2024-02-28T19:28:27Z) - An Information Theoretic Approach to Machine Unlearning [43.423418819707784]
To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important.<n>In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten.<n>We derive a simple but principled zero-shot unlearning method based on the geometry of the model.
arXiv Detail & Related papers (2024-02-02T13:33:30Z) - Memory Consistency Guided Divide-and-Conquer Learning for Generalized
Category Discovery [56.172872410834664]
Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning.
We propose a Memory Consistency guided Divide-and-conquer Learning framework (MCDL)
Our method outperforms state-of-the-art models by a large margin on both seen and unseen classes of the generic image recognition.
arXiv Detail & Related papers (2024-01-24T09:39:45Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [49.15931834209624]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.<n>We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.<n>By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Generative Meta-Learning for Zero-Shot Relation Triplet Extraction [20.556880137419064]
Zero-shot Relation Triplet Extraction (ZeroRTE) aims to extract relation triplets from texts containing unseen relation types.<n>Existing approaches typically leverage the knowledge embedded in pre-trained language models to accomplish the generalization process.<n>We propose a generative meta-learning framework which exploits the learning-to-learn' ability of meta-learning to boost the generalization capability of generative models.
arXiv Detail & Related papers (2023-05-03T06:34:39Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - Improving Contrastive Learning on Imbalanced Seed Data via Open-World
Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK)
MAK follows three simple principles: tailness, proximity, and diversity.
We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z) - Layer-wise Characterization of Latent Information Leakage in Federated
Learning [9.397152006395174]
Training deep neural networks via federated learning allows clients to share, instead of the original data, only the model trained on their data.
Prior work has demonstrated that in practice a client's private information, unrelated to the main learning task, can be discovered from the model's gradients.
There is still no formal approach for quantifying the leakage of private information via the shared updated model or gradients.
arXiv Detail & Related papers (2020-10-17T10:49:14Z) - GRAFFL: Gradient-free Federated Learning of a Bayesian Generative Model [8.87104231451079]
This paper presents the first gradient-free federated learning framework called GRAFFL.
It uses implicit information derived from each participating institution to learn posterior distributions of parameters.
We propose the GRAFFL-based Bayesian mixture model to serve as a proof-of-concept of the framework.
arXiv Detail & Related papers (2020-08-29T07:19:44Z) - Deep F-measure Maximization for End-to-End Speech Understanding [52.36496114728355]
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
We perform experiments on two standard fairness datasets, Adult, Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset.
In all four of these tasks, F-measure results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function.
arXiv Detail & Related papers (2020-08-08T03:02:27Z) - REALM: Retrieval-Augmented Language Model Pre-Training [37.3178586179607]
We augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia.
For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA)
arXiv Detail & Related papers (2020-02-10T18:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.