GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs
- URL: http://arxiv.org/abs/2503.09117v3
- Date: Thu, 05 Jun 2025 13:34:42 GMT
- Title: GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs
- Authors: Yue Wang, Qizhou Wang, Feng Liu, Wei Huang, Yali Du, Xiaojiang Du, Bo Han,
- Abstract summary: We propose gradient Rectified Unlearning (GRU), an improved framework that regulates the directions of updates during the unlearning procedure.<n>GRU is easy and general to implement, demonstrating practical effectiveness across a variety of well-established unlearning benchmarks.
- Score: 34.90826139012299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model (LLM) unlearning has demonstrated its essential role in removing privacy and copyright-related responses, crucial for their legal and safe applications. However, the pursuit of complete unlearning often comes with substantial costs due to its compromises in their general functionality, leading to a notorious trade-off between unlearning and retention. It motivates this paper to explore enhanced unlearning schemes that can mitigate this trade-off. Specifically, we propose Gradient Rectified Unlearning (GRU), an improved framework that regulates the directions of gradient updates during the unlearning procedure such that their side impacts on other, unrelated responses can be minimized. GRU is easy and general to implement, demonstrating practical effectiveness across a variety of well-established unlearning benchmarks.
Related papers
- MeGU: Machine-Guided Unlearning with Target Feature Disentanglement [73.49657372882082]
We propose a novel framework that guides unlearning through concept-aware re-alignment.<n>MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.
arXiv Detail & Related papers (2026-02-19T05:20:31Z) - Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions [49.55618517046225]
Language models trained on web-scale corpora risk memorizing and exposing sensitive information.<n>We propose Corrective Unlearning with Retrieved Exclusions (CURE), a novel unlearning framework.<n>CURE verifies model outputs for leakage and revises them into safe responses.
arXiv Detail & Related papers (2025-09-30T09:07:45Z) - Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning [5.025037011107095]
Vision-language action (VLA) policies represent a promising direction for solving diverse tasks.<n>We propose textbf VARL (textbfVLM as textbfAction advisor for online textbfReinforcement textbfL), a framework that leverages the domain knowledge of vision-language models (VLMs) to provide action suggestions for reinforcement learning agents.
arXiv Detail & Related papers (2025-09-25T13:16:34Z) - GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models [23.667160042806064]
GUARD is a novel framework for guided unlearning and retention via data attribution.<n>At its core, GUARD introduces a lightweight proxy data attribution metric tailored for LLM unlearning.<n>We provide rigorous theoretical guarantees that GUARD significantly enhances retention while maintaining forgetting metrics comparable to prior methods.
arXiv Detail & Related papers (2025-06-12T17:49:09Z) - From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning [76.09281171131941]
Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy.<n>We propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors.
arXiv Detail & Related papers (2025-05-21T15:00:07Z) - Efficient Machine Unlearning by Model Splitting and Core Sample Selection [4.634454848598446]
We introduce a variant of the standard unlearning metric that enables more efficient and precise unlearning strategies.<n>We also present an unlearning-aware training procedure that, in many cases, allows for exact unlearning.<n>When exact unlearning is not feasible, MaxRR still supports efficient unlearning with properties closely matching those achieved through full retraining.
arXiv Detail & Related papers (2025-05-11T15:42:11Z) - GRAIL: Gradient-Based Adaptive Unlearning for Privacy and Copyright in LLMs [26.13653211674955]
Large Language Models (LLMs) trained on extensive datasets often learn sensitive information.
Retraining entire models from scratch to remove undesired information is both costly and impractical.
We propose GRAIL (GRadient-based AdaptIve unLearning), a novel multi-domain unlearning framework.
arXiv Detail & Related papers (2025-04-17T06:16:32Z) - SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs [24.48560556882878]
We introduce $textbfDynamic DAE Guardrails$ (DSG), a novel method for precision unlearning.
Our experiments show DSG substantially outperforms leading unlearning methods.
arXiv Detail & Related papers (2025-04-11T01:24:03Z) - Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond [39.39558417665764]
Large language models (LLMs) should undergo rigorous audits to identify potential risks, such as copyright and privacy infringements.<n>We propose a toolkit of the gradient effect (G-effect), quantifying the impacts of unlearning objectives on model performance.
arXiv Detail & Related papers (2025-02-26T16:59:21Z) - ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification [53.80183105328448]
Refine via Intrinsic Self-Verification (ReVISE) is an efficient framework that enables LLMs to self-correct their outputs through self-verification.<n>Our experiments on various reasoning tasks demonstrate that ReVISE achieves efficient self-correction and significantly improves reasoning performance.
arXiv Detail & Related papers (2025-02-20T13:50:02Z) - CGLearn: Consistent Gradient-Based Learning for Out-of-Distribution Generalization [0.7366405857677226]
In this work, we introduce a simple yet powerful approach, CGLearn, which relies on the agreement of gradients across various environments.
Our proposed method demonstrates superior performance compared to state-of-the-art methods in both linear and nonlinear settings.
Comprehensive experiments on both synthetic and real-world datasets highlight its effectiveness in diverse scenarios.
arXiv Detail & Related papers (2024-11-09T02:36:39Z) - Dataset Awareness is not Enough: Implementing Sample-level Tail Encouragement in Long-tailed Self-supervised Learning [16.110763554788445]
We introduce pseudo-labels into self-supervised long-tailed learning, utilizing pseudo-label information to drive a dynamic temperature and re-weighting strategy.
We analyze the lack of quantity awareness in the temperature parameter and use re-weighting to compensate for this deficiency, thereby achieving optimal training patterns at the sample level.
arXiv Detail & Related papers (2024-10-30T10:25:22Z) - A Closer Look at Machine Unlearning for Large Language Models [46.245404272612795]
Large language models (LLMs) may memorize sensitive or copyrighted content, raising privacy and legal concerns.<n>We discuss several issues in machine unlearning for LLMs and provide our insights on possible approaches.
arXiv Detail & Related papers (2024-10-10T16:56:05Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models.<n>It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Rethinking Machine Unlearning for Large Language Models [85.92660644100582]
We explore machine unlearning in the domain of large language models (LLMs)<n>This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities.
arXiv Detail & Related papers (2024-02-13T20:51:58Z) - InferAligner: Inference-Time Alignment for Harmlessness through
Cross-Model Guidance [56.184255657175335]
We develop textbfInferAligner, a novel inference-time alignment method that utilizes cross-model guidance for harmlessness alignment.
Experimental results show that our method can be very effectively applied to domain-specific models in finance, medicine, and mathematics.
It significantly diminishes the Attack Success Rate (ASR) of both harmful instructions and jailbreak attacks, while maintaining almost unchanged performance in downstream tasks.
arXiv Detail & Related papers (2024-01-20T10:41:03Z) - RobustCalib: Robust Lidar-Camera Extrinsic Calibration with Consistency
Learning [42.90987864456673]
Current methods for LiDAR-camera extrinsics estimation depend on offline targets and human efforts.
We propose a novel approach to address the extrinsic calibration problem in a robust, automatic, and single-shot manner.
We conduct comprehensive experiments on different datasets, and the results demonstrate that our method achieves accurate and robust performance.
arXiv Detail & Related papers (2023-12-02T09:29:50Z) - DELTA: Dynamic Embedding Learning with Truncated Conscious Attention for
CTR Prediction [61.68415731896613]
Click-Through Rate (CTR) prediction is a pivotal task in product and content recommendation.
We propose a model that enables Dynamic Embedding Learning with Truncated Conscious Attention for CTR prediction.
arXiv Detail & Related papers (2023-05-03T12:34:45Z) - Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification
in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks.
We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD.
The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z) - CLARE: Conservative Model-Based Reward Learning for Offline Inverse
Reinforcement Learning [26.05184273238923]
This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL)
We devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function.
Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy.
arXiv Detail & Related papers (2023-02-09T17:16:29Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.