GRU: Mitigating the Trade-off between Unlearning and Retention for Large Language Models
- URL: http://arxiv.org/abs/2503.09117v1
- Date: Wed, 12 Mar 2025 07:08:54 GMT
- Title: GRU: Mitigating the Trade-off between Unlearning and Retention for Large Language Models
- Authors: Yue Wang, Qizhou Wang, Feng Liu, Wei Huang, Yali Du, Xiaojiang Du, Bo Han,
- Abstract summary: Large language model (LLM) unlearning has demonstrated its essential role in removing privacy and copyright-related responses.<n>The pursuit of complete unlearning often comes with substantial costs due to its compromises in their general functionality.<n>We propose Gradient Rectified Unlearning (GRU), an enhanced unlearning framework controlling the updating gradients in a geometry-focused manner.
- Score: 34.90826139012299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model (LLM) unlearning has demonstrated its essential role in removing privacy and copyright-related responses, crucial for their legal and safe applications. However, the pursuit of complete unlearning often comes with substantial costs due to its compromises in their general functionality, leading to a notorious trade-off between unlearning and retention. In examining the update process for unlearning dynamically, we find gradients hold essential information for revealing this trade-off. In particular, we look at the varying relationship between retention performance and directional disparities between gradients during unlearning. It motivates the sculpting of an update mechanism derived from gradients from two sources, i.e., harmful for retention and useful for unlearning. Accordingly, we propose Gradient Rectified Unlearning (GRU), an enhanced unlearning framework controlling the updating gradients in a geometry-focused and optimization-driven manner such that their side impacts on other, unrelated responses can be minimized. Specifically, GRU derives a closed-form solution to project the unlearning gradient onto the orthogonal space of that gradient harmful for retention, ensuring minimal deviation from its original direction under the condition that overall performance is retained. Comprehensive experiments are conducted to demonstrate that GRU, as a general framework, is straightforward to implement and efficiently enhances a range of baseline methods through its adaptable and compatible characteristics. Additionally, experimental results show its broad effectiveness across a diverse set of benchmarks for LLM unlearning.
Related papers
- GRAIL: Gradient-Based Adaptive Unlearning for Privacy and Copyright in LLMs [26.13653211674955]
Large Language Models (LLMs) trained on extensive datasets often learn sensitive information.
Retraining entire models from scratch to remove undesired information is both costly and impractical.
We propose GRAIL (GRadient-based AdaptIve unLearning), a novel multi-domain unlearning framework.
arXiv Detail & Related papers (2025-04-17T06:16:32Z) - SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs [24.48560556882878]
We introduce $textbfDynamic DAE Guardrails$ (DSG), a novel method for precision unlearning.
Our experiments show DSG substantially outperforms leading unlearning methods.
arXiv Detail & Related papers (2025-04-11T01:24:03Z) - CGLearn: Consistent Gradient-Based Learning for Out-of-Distribution Generalization [0.7366405857677226]
In this work, we introduce a simple yet powerful approach, CGLearn, which relies on the agreement of gradients across various environments.
Our proposed method demonstrates superior performance compared to state-of-the-art methods in both linear and nonlinear settings.
Comprehensive experiments on both synthetic and real-world datasets highlight its effectiveness in diverse scenarios.
arXiv Detail & Related papers (2024-11-09T02:36:39Z) - Dataset Awareness is not Enough: Implementing Sample-level Tail Encouragement in Long-tailed Self-supervised Learning [16.110763554788445]
We introduce pseudo-labels into self-supervised long-tailed learning, utilizing pseudo-label information to drive a dynamic temperature and re-weighting strategy.
We analyze the lack of quantity awareness in the temperature parameter and use re-weighting to compensate for this deficiency, thereby achieving optimal training patterns at the sample level.
arXiv Detail & Related papers (2024-10-30T10:25:22Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models.<n>It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - RobustCalib: Robust Lidar-Camera Extrinsic Calibration with Consistency
Learning [42.90987864456673]
Current methods for LiDAR-camera extrinsics estimation depend on offline targets and human efforts.
We propose a novel approach to address the extrinsic calibration problem in a robust, automatic, and single-shot manner.
We conduct comprehensive experiments on different datasets, and the results demonstrate that our method achieves accurate and robust performance.
arXiv Detail & Related papers (2023-12-02T09:29:50Z) - DELTA: Dynamic Embedding Learning with Truncated Conscious Attention for
CTR Prediction [61.68415731896613]
Click-Through Rate (CTR) prediction is a pivotal task in product and content recommendation.
We propose a model that enables Dynamic Embedding Learning with Truncated Conscious Attention for CTR prediction.
arXiv Detail & Related papers (2023-05-03T12:34:45Z) - Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification
in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks.
We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD.
The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z) - CLARE: Conservative Model-Based Reward Learning for Offline Inverse
Reinforcement Learning [26.05184273238923]
This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL)
We devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function.
Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy.
arXiv Detail & Related papers (2023-02-09T17:16:29Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.