Beyond Superficial Forgetting: Thorough Unlearning through Knowledge Density Estimation and Block Re-insertion
- URL: http://arxiv.org/abs/2511.11667v1
- Date: Tue, 11 Nov 2025 14:12:43 GMT
- Title: Beyond Superficial Forgetting: Thorough Unlearning through Knowledge Density Estimation and Block Re-insertion
- Authors: Feng Guo, Yuntao Wen, Shen Gao, Junshuo Zhang, Shuo Shang,
- Abstract summary: We propose Knowledge Density-Guided Unlearning via Blocks Reinsertion (KUnBR) for large language models.<n>KUnBR identifies layers with rich harmful knowledge and then thoroughly eliminates the harmful knowledge via re-insertion strategy.<n>Experiments conducted on several unlearning and general capability benchmarks demonstrate that KUnBR achieves state-of-the-art forgetting performance.
- Score: 27.526437626781597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine unlearning, which selectively removes harmful knowledge from a pre-trained model without retraining from scratch, is crucial for addressing privacy, regulatory compliance, and ethical concerns in Large Language Models (LLMs). However, existing unlearning methods often struggle to thoroughly remove harmful knowledge, leaving residual harmful knowledge that can be easily recovered. To address these limitations, we propose Knowledge Density-Guided Unlearning via Blocks Reinsertion (KUnBR), a novel approach that first identifies layers with rich harmful knowledge and then thoroughly eliminates the harmful knowledge via re-insertion strategy. Our method introduces knowledge density estimation to quantify and locate layers containing the most harmful knowledge, enabling precise unlearning. Additionally, we design a layer re-insertion strategy that extracts and re-inserts harmful knowledge-rich layers into the original LLM, bypassing gradient obstruction caused by cover layers and ensuring effective gradient propagation during unlearning. Extensive experiments conducted on several unlearning and general capability benchmarks demonstrate that KUnBR achieves state-of-the-art forgetting performance while maintaining model utility.
Related papers
- KUDA: Knowledge Unlearning by Deviating Representation for Large Language Models [26.418820118903852]
Large language models (LLMs) acquire a large amount of knowledge through pre-training on vast and diverse corpora.<n>LLMs unlearning is a promising technique to reduce risks associated with sensitive, copyrighted, or harmful content in training data.<n>We propose Knowledge Unlearning by Deviating representAtion (KUDA) to achieve effective unlearning at the knowledge level of LLMs.
arXiv Detail & Related papers (2026-02-22T17:16:49Z) - CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment [14.853204323785334]
Existing approaches, rooted in Gradient Ascent (GA), often degrade general domain knowledge while relying on retention data or curated contrastive pairs.<n>We develop a principled method that rescales unlearning effects in proportion to the model's token-level confidence.<n>Our work enables effective unlearning without requiring retention data or contrastive unlearning response pairs.
arXiv Detail & Related papers (2026-02-02T21:23:54Z) - KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints [29.0623696841584]
Large Multimodal Models encode extensive factual knowledge in their pre-trained weights.<n>Existing methods often struggle to learn new knowledge and suffer from catastrophic forgetting.<n>We propose KORE, a method of injecting new knowledge into large multimodal models while preserving old knowledge.
arXiv Detail & Related papers (2025-10-22T07:26:55Z) - LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data [69.5099112089508]
Large language models (LLMs) exhibit remarkable generative capabilities but raise ethical and security concerns by memorizing sensitive data.<n>This work presents the first study of unlearning under perturbed or low-fidelity forget data, referred to as noisy forget sets.<n>We find that unlearning remains surprisingly robust to perturbations, provided that core semantic signals are preserved.
arXiv Detail & Related papers (2025-10-10T05:10:49Z) - Understanding the Dilemma of Unlearning for Large Language Models [50.54260066313032]
Unlearning seeks to remove specific knowledge from large language models (LLMs)<n>We propose unPact, an interpretable framework for unlearning via prompt attribution and contribution tracking.
arXiv Detail & Related papers (2025-09-29T12:15:19Z) - Step-by-Step Reasoning Attack: Revealing 'Erased' Knowledge in Large Language Models [9.719371187651591]
Unlearning techniques suppress and leave the knowledge beneath the surface, thus making it retrievable with the right prompts.<n>We introduce a step-by-step reasoning-based black-box attack, Sleek, that systematically exposes unlearning failures.<n>Of the generated adversarial prompts, 62.5% successfully retrieved forgotten Harry Potter facts from WHP-unlearned Llama, while 50% exposed unfair suppression of retained knowledge.
arXiv Detail & Related papers (2025-06-14T04:22:17Z) - Safety Alignment via Constrained Knowledge Unlearning [11.225354394106226]
We propose a novel safety alignment strategy, Constrained Knowledge Unlearning (CKU)<n>CKU focuses on two primary objectives: knowledge localization and retention, and unlearning harmful knowledge.<n> Experimental results demonstrate that CKU significantly enhances model safety without compromising overall performance.
arXiv Detail & Related papers (2025-05-24T08:29:50Z) - Enhancing LLM Knowledge Learning through Generalization [73.16975077770765]
We show that an LLM's ability to continually predict the same factual knowledge tokens given diverse paraphrased contexts is positively correlated with its capacity to extract that knowledge via question-answering.<n>We propose two strategies to enhance LLMs' ability to predict the same knowledge tokens given varied contexts, thereby enhancing knowledge acquisition.
arXiv Detail & Related papers (2025-03-05T17:56:20Z) - InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge Integration [58.61492157691623]
Methods for integrating knowledge have been developed, which augment LLMs with domain-specific knowledge graphs through external modules.<n>Our research focuses on a novel problem: efficiently integrating unknown knowledge into LLMs without unnecessary overlap of known knowledge.<n>A risk of introducing new knowledge is the potential forgetting of existing knowledge.
arXiv Detail & Related papers (2024-02-18T03:36:26Z) - KnowTuning: Knowledge-aware Fine-tuning for Large Language Models [83.5849717262019]
We propose a knowledge-aware fine-tuning (KnowTuning) method to improve fine-grained and coarse-grained knowledge awareness of LLMs.
KnowTuning generates more facts with less factual error rate under fine-grained facts evaluation.
arXiv Detail & Related papers (2024-02-17T02:54:32Z) - Towards Safer Large Language Models through Machine Unlearning [19.698620794387338]
Selective Knowledge Unlearning ( SKU) is designed to eliminate harmful knowledge while preserving utility on normal prompts.
First stage aims to identify and acquire harmful knowledge within the model, whereas the second is dedicated to remove this knowledge.
Our experiments demonstrate that SKU identifies a good balance point between removing harmful information and preserving utility.
arXiv Detail & Related papers (2024-02-15T16:28:34Z) - Learning with Recoverable Forgetting [77.56338597012927]
Learning wIth Recoverable Forgetting explicitly handles the task- or sample-specific knowledge removal and recovery.
Specifically, LIRF brings in two innovative schemes, namely knowledge deposit and withdrawal.
We conduct experiments on several datasets, and demonstrate that the proposed LIRF strategy yields encouraging results with gratifying generalization capability.
arXiv Detail & Related papers (2022-07-17T16:42:31Z) - Preserving Earlier Knowledge in Continual Learning with the Help of All
Previous Feature Extractors [63.21036904487014]
Continual learning of new knowledge over time is one desirable capability for intelligent systems to recognize more and more classes of objects.
We propose a simple yet effective fusion mechanism by including all the previously learned feature extractors into the intelligent model.
Experiments on multiple classification tasks show that the proposed approach can effectively reduce the forgetting of old knowledge, achieving state-of-the-art continual learning performance.
arXiv Detail & Related papers (2021-04-28T07:49:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.