Label Smoothing Improves Gradient Ascent in LLM Unlearning
- URL: http://arxiv.org/abs/2510.22376v1
- Date: Sat, 25 Oct 2025 17:43:34 GMT
- Title: Label Smoothing Improves Gradient Ascent in LLM Unlearning
- Authors: Zirui Pang, Hao Zheng, Zhijie Deng, Ling Li, Zixin Zhong, Jiaheng Wei,
- Abstract summary: We propose Smoothed Gradient Ascent (SGA) for unlearning models.<n>SGA combines the forget data with multiple constructed normal data through a tunable smoothing rate.<n>We evaluate SGA on three benchmarks: TOFU, Harry Potter, and MUSE-NEWS.
- Score: 31.069520631133724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLM unlearning has emerged as a promising approach, aiming to enable models to forget hazardous/undesired knowledge at low cost while preserving as much model utility as possible. Among existing techniques, the most straightforward method is performing Gradient Ascent (GA) w.r.t. the forget data, thereby forcing the model to unlearn the forget dataset. However, GA suffers from severe instability, as it drives updates in a divergent direction, often resulting in drastically degraded model utility. To address this issue, we propose Smoothed Gradient Ascent (SGA). SGA combines the forget data with multiple constructed normal data through a tunable smoothing rate. Intuitively, this extends GA from learning solely on the forget data to jointly learning across both forget and normal data, enabling more stable unlearning while better preserving model utility. Theoretically, we provide the theoretical guidance on the selection of the optimal smoothing rate. Empirically, we evaluate SGA on three benchmarks: TOFU, Harry Potter, and MUSE-NEWS. Experimental results demonstrate that SGA consistently outperforms the original Gradient Ascent (GA) method across all metrics and achieves top-2 performance among all baseline methods on several key metrics.
Related papers
- Is Gradient Ascent Really Necessary? Memorize to Forget for Machine Unlearning [71.96329385684395]
We propose model extrapolation as an alternative to gradient ascent (GA)<n>Counterfactual as it might sound, a forget model can be obtained via extrapolation from the memorization model to the reference model.<n>Our model extrapolation is simple and efficient to implement, and it can also effectively converge throughout training to achieve improved unlearning performance.
arXiv Detail & Related papers (2026-02-06T07:11:27Z) - LARGE: A Locally Adaptive Regularization Approach for Estimating Gaussian Graphical Models [2.3696387635465608]
We develop Locally Adaptive Regularization for Graph Estimation (LARGE)<n>LARGE is an approach to adaptively learn nodewise tuning parameters to improve graph estimation and selection.<n>We demonstrate the practical utility of our method by estimating brain connectivity from a real fMRI data set.
arXiv Detail & Related papers (2026-01-14T18:37:50Z) - AL-GNN: Privacy-Preserving and Replay-Free Continual Graph Learning via Analytic Learning [8.911446190681882]
Continual graph learning (CGL) aims to enable graph neural networks to incrementally learn from a stream of graph structured data without forgetting previously acquired knowledge.<n>We propose AL GNN, a novel framework for continual graph learning that eliminates the need for backpropagation and replay buffers.
arXiv Detail & Related papers (2025-12-20T09:55:36Z) - FreeGAD: A Training-Free yet Effective Approach for Graph Anomaly Detection [54.576802512108685]
Graph Anomaly Detection (GAD) aims to identify nodes that deviate from the majority within a graph.<n>Existing approaches often suffer from high deployment costs and poor scalability due to their complex and resource-intensive training processes.<n>We propose FreeGAD, a novel training-free yet effective GAD method.
arXiv Detail & Related papers (2025-08-14T12:37:20Z) - LightGCL: Simple Yet Effective Graph Contrastive Learning for
Recommendation [9.181689366185038]
Graph neural clustering network (GNN) is a powerful learning approach for graph-based recommender systems.
In this paper, we propose a simple yet effective graph contrastive learning paradigm LightGCL.
arXiv Detail & Related papers (2023-02-16T10:16:21Z) - Efficient Bi-Level Optimization for Recommendation Denoising [31.968068788022403]
implicit feedback possesses a high degree of noise, which significantly undermines recommendation quality.
We model recommendation denoising as a bi-level optimization problem.
The inner optimization aims to derive an effective model for the recommendation, as well as guiding the weight determination.
We employ a weight generator to avoid the storage of weights and a one-step gradient-matching-based loss to significantly reduce computational time.
arXiv Detail & Related papers (2022-10-19T06:36:21Z) - Contrastive Graph Few-Shot Learning [67.01464711379187]
We propose a Contrastive Graph Few-shot Learning framework (CGFL) for graph mining tasks.
CGFL learns data representation in a self-supervised manner, thus mitigating the distribution shift impact for better generalization.
Comprehensive experiments demonstrate that CGFL outperforms state-of-the-art baselines on several graph mining tasks.
arXiv Detail & Related papers (2022-09-30T20:40:23Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Robust Optimization as Data Augmentation for Large-scale Graphs [117.2376815614148]
We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training.
FLAG is a general-purpose approach for graph data, which universally works in node classification, link prediction, and graph classification tasks.
arXiv Detail & Related papers (2020-10-19T21:51:47Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.