Attention Smoothing Is All You Need For Unlearning
- URL: http://arxiv.org/abs/2603.01285v1
- Date: Sun, 01 Mar 2026 21:39:35 GMT
- Title: Attention Smoothing Is All You Need For Unlearning
- Authors: Saleh Zare Zade, Xiangyu Zhou, Sijia Liu, Dongxiao Zhu,
- Abstract summary: Large Language Models are prone to memorizing sensitive, copyrighted, or hazardous content, posing significant privacy and legal concerns.<n>We propose Attention Smoothing Unlearning, a principled framework that casts unlearning as self-distillation from a forget-teacher derived from the model's own attention.
- Score: 12.239021292288967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models are prone to memorizing sensitive, copyrighted, or hazardous content, posing significant privacy and legal concerns. Retraining from scratch is computationally infeasible, whereas current unlearning methods exhibit unstable trade-offs between forgetting and utility, frequently producing incoherent outputs on forget prompts and failing to generalize due to the persistence of lexical-level and semantic-level associations in attention. We propose Attention Smoothing Unlearning (ASU), a principled framework that casts unlearning as self-distillation from a forget-teacher derived from the model's own attention. By increasing the softmax temperature, ASU flattens attention distributions and directly suppresses the lexical-level and semantic-level associations responsible for reconstructing memorized knowledge. This results in a bounded optimization objective that erases factual information yet maintains coherence in responses to forget prompts. Empirical evaluation on TOFU, MUSE, and WMDP, along with real-world and continual unlearning scenarios across question answering and text completion, demonstrates that ASU outperforms the baselines for most unlearning scenarios, delivering robust unlearning with minimal loss of model utility.
Related papers
- Consistency-Aware Editing for Entity-level Unlearning in Language Models [53.522931419965424]
We introduce a novel consistency-aware editing (CAE) framework for entity-level unlearning.<n>CAE aggregates a diverse set of prompts related to a target entity, including its attributes, relations, and adversarial paraphrases.<n>It then jointly learns a low-rank update guided by a consistency regularizer that aligns the editing directions across prompts.
arXiv Detail & Related papers (2025-12-19T15:18:07Z) - Towards Reasoning-Preserving Unlearning in Multimodal Large Language Models [17.184948937224142]
Machine unlearning aims to erase requested data from trained models without full retraining.<n> intermediate chain-of-thought steps can still leak sensitive information even when final answers are forgotten.<n>We propose R-MUSE, a training-free and inference-time intervention framework that steers internal representations to forget both answers and reasoning traces.
arXiv Detail & Related papers (2025-11-26T13:45:52Z) - Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning [49.274436951541425]
Multimodal Large Language Models (MLLMs) achieve remarkable capabilities but can inadvertently memorize privacy-sensitive information.<n>Existing unlearning methods fail to achieve benign forgetting because they often degrade the model's general image understanding performance.<n>We propose the Sculpted Memory Forgetting Adapter (SMFA), which confines forgetting to targeted memory regions while preserving overall capabilities.
arXiv Detail & Related papers (2025-11-25T11:22:45Z) - Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting [11.725875396424927]
We introduce Attention-Shifting (AS) framework for selective unlearning.<n>AS is driven by two design objectives: (1) context-preserving suppression that attenuates attention to fact-bearing tokens without disrupting LLMs' linguistic structure; and (2) hallucination-resistant response shaping that discourages fabricated completions when queried about unlearning content.<n> Experimental results show that AS improves performance over the state-of-the-art unlearning methods, achieving up to 15% higher accuracy on the ToFU benchmark and 10% on the TDEC benchmark, while maintaining competitive hallucination-free unlearning effectiveness.
arXiv Detail & Related papers (2025-10-20T06:50:03Z) - LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data [69.5099112089508]
Large language models (LLMs) exhibit remarkable generative capabilities but raise ethical and security concerns by memorizing sensitive data.<n>This work presents the first study of unlearning under perturbed or low-fidelity forget data, referred to as noisy forget sets.<n>We find that unlearning remains surprisingly robust to perturbations, provided that core semantic signals are preserved.
arXiv Detail & Related papers (2025-10-10T05:10:49Z) - Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions [49.55618517046225]
Language models trained on web-scale corpora risk memorizing and exposing sensitive information.<n>We propose Corrective Unlearning with Retrieved Exclusions (CURE), a novel unlearning framework.<n>CURE verifies model outputs for leakage and revises them into safe responses.
arXiv Detail & Related papers (2025-09-30T09:07:45Z) - BLUR: A Bi-Level Optimization Approach for LLM Unlearning [100.90394814817965]
We argue that it is important to model the hierarchical structure of the unlearning problem.<n>We propose a novel algorithm, termed Bi-Level UnleaRning (textttBLUR), which delivers superior performance.
arXiv Detail & Related papers (2025-06-09T19:23:05Z) - Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness [46.653774740885275]
Machine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs)<n>We propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge.<n>Our framework provides a more realistic and rigorous assessment of unlearning performance.
arXiv Detail & Related papers (2025-06-06T04:35:19Z) - Not All Tokens Are Meant to Be Forgotten [13.060635265281864]
Large Language Models (LLMs) exhibit remarkable human-level language understanding, reasoning, and decision-making abilities.<n>LLMs tend to memorize unwanted information, such as private or copyrighted content, raising significant privacy and legal concerns.
arXiv Detail & Related papers (2025-06-03T17:59:05Z) - Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models [81.62767292169225]
We investigate knowledge forgetting in large language models with a focus on its generalisation.<n>We propose PerMU, a novel probability perturbation-based unlearning paradigm.<n>Experiments are conducted on a diverse range of datasets, including TOFU, Harry Potter, ZsRE, WMDP, and MUSE.
arXiv Detail & Related papers (2025-02-27T11:03:33Z) - ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning.<n>This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation.<n>Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.