Related papers: GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

URL: http://arxiv.org/abs/2506.10946v1
Date: Thu, 12 Jun 2025 17:49:09 GMT
Title: GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models
Authors: Evelyn Ma, Duo Zhou, Peizhi Niu, Huiting Zhou, Huan Zhang, Olgica Milenkovic, S. Rasoul Etesami,
Abstract summary: GUARD is a novel framework for guided unlearning and retention via data attribution.<n>At its core, GUARD introduces a lightweight proxy data attribution metric tailored for LLM unlearning.<n>We provide rigorous theoretical guarantees that GUARD significantly enhances retention while maintaining forgetting metrics comparable to prior methods.
Score: 23.667160042806064
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unlearning in large language models (LLMs) is becoming increasingly important due to regulatory compliance, copyright protection, and privacy concerns. However, a key challenge in LLM unlearning is unintended forgetting, where the removal of specific data inadvertently impairs the utility of the model and its retention of valuable, desired information. While prior work has primarily focused on architectural innovations, the influence of data-level factors on unlearning performance remains underexplored. As a result, existing methods often suffer from degraded retention when forgetting high-impact data. To address this, we propose GUARD-a novel framework for Guided Unlearning And Retention via Data attribution. At its core, GUARD introduces a lightweight proxy data attribution metric tailored for LLM unlearning, which quantifies the "alignment" between the forget and retain sets while remaining computationally efficient. Building on this, we design a novel unlearning objective that assigns adaptive, nonuniform unlearning weights to samples, inversely proportional to their proxy attribution scores. Through such a reallocation of unlearning power, GUARD mitigates unintended losses in retention. We provide rigorous theoretical guarantees that GUARD significantly enhances retention while maintaining forgetting metrics comparable to prior methods. Extensive experiments on the TOFU benchmark across multiple LLM architectures demonstrate that GUARD substantially improves utility preservation while ensuring effective unlearning. Notably, GUARD reduces utility sacrifice on the Retain Set by up to 194.92% in terms of Truth Ratio when forgetting 10% of the training data.

Related papers

BLUR: A Bi-Level Optimization Approach for LLM Unlearning [105.98410883830596]
We argue that it is important to model the hierarchical structure of the unlearning problem.<n>We propose a novel algorithm, termed Bi-Level UnleaRning (textttBLUR), which delivers superior performance.
arXiv Detail & Related papers (2025-06-09T19:23:05Z)
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models [7.566515311806724]
Large Language Models (LLMs) deployed in real-world settings increasingly face the need to unlearn sensitive, outdated, or proprietary information.<n>Existing unlearning methods formulate forgetting and retention as a regularized trade-off, combining both objectives into a single scalarized loss.<n>We propose a new formulation of LLM unlearning as a constrained optimization problem: forgetting is enforced via a novel logit-margin flattening loss.
arXiv Detail & Related papers (2025-06-05T17:55:23Z)
GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection [36.38245533018162]
Large Language Models (LLMs) have demonstrated strong capabilities in memorizing vast amounts of knowledge across diverse domains.<n>Existing unlearning efforts typically fine-tune the model with resources such as forget data, retain data, and a calibration model.<n>We propose Generation-time Unlearning via Adaptive Restriction and Detection (GUARD), a framework that enables dynamic unlearning during LLM generation.
arXiv Detail & Related papers (2025-05-19T16:26:58Z)
LLM Unlearning via Loss Adjustment with Only Forget Data [20.310423152885217]
We introduce Forget data only Loss AjustmenT (FLAT), a "flat" loss adjustment approach which addresses these issues. Empirical results demonstrate that our approach achieves superior unlearning performance compared to existing methods.
arXiv Detail & Related papers (2024-10-14T23:43:33Z)
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs [25.91643745340183]
Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora.<n>This poses risk of privacy and copyright violations, highlighting the need for efficient machine unlearning methods.<n>We propose Low-rank Knowledge Unlearning (LoKU), a novel framework that enables robust and efficient unlearning for LLMs.
arXiv Detail & Related papers (2024-08-13T04:18:32Z)
Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models [37.172662930947446]
Language models (LMs) are potentially vulnerable to extraction attacks, which represent a significant privacy risk. We propose Privacy Protection via Optimal Parameters (POP), a novel unlearning method that effectively forgets the target token sequences from the pretrained LM. POP exhibits remarkable retention performance post-unlearning across 9 classification and 4 dialogue benchmarks, outperforming the state-of-the-art by a large margin.
arXiv Detail & Related papers (2024-06-20T08:12:49Z)
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models.<n>It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Ungeneralizable Examples [70.76487163068109]
Current approaches to creating unlearnable data involve incorporating small, specially designed noises. We extend the concept of unlearnable data to conditional data learnability and introduce textbfUntextbfGeneralizable textbfExamples (UGEs) UGEs exhibit learnability for authorized users while maintaining unlearnability for potential hackers.
arXiv Detail & Related papers (2024-04-22T09:29:14Z)
Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks. We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information. Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z)
Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data. This process might suffer from privacy issues and violations of data protection regulations. We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.