Related papers: Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs

Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs

URL: http://arxiv.org/abs/2509.24166v1
Date: Mon, 29 Sep 2025 01:30:15 GMT
Title: Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs
Authors: Arpit Garg, Hemanth Saratchandran, Ravi Garg, Simon Lucey,
Abstract summary: We provide a theoretical framework that explains how ascent on the forget set destabilizes optimization in the feedforward layers of large language models (LLMs)<n>We propose Bounded Bounded Unlearning, a parameter-efficient approach that stabilizes fine-tuning by applying bounded functions to adapters.<n>Our method achieves substantial improvements in forgetting while preserving retention, establishing a novel theoretically grounded and practically scalable framework for unlearning in LLMs.
Score: 30.089412595436585
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine unlearning in large language models (LLMs) is essential for privacy and safety; however, existing approaches remain unstable and unreliable. A widely used strategy, the gradient difference method, applies gradient descent on retained data while performing gradient ascent on forget data, the data whose influence should be removed. However, when combined with cross-entropy loss, this procedure causes unbounded growth of weights and gradients, leading to training instability and degrading both forgetting and retention. We provide a theoretical framework that explains this failure, explicitly showing how ascent on the forget set destabilizes optimization in the feedforward MLP layers of LLMs. Guided by this insight, we propose Bounded Parameter-Efficient Unlearning, a parameter-efficient approach that stabilizes LoRA-based fine-tuning by applying bounded functions to MLP adapters. This simple modification controls the weight dynamics during ascent, enabling the gradient difference method to converge reliably. Across the TOFU, TDEC, and MUSE benchmarks, and across architectures and scales from 125M to 8B parameters, our method achieves substantial improvements in forgetting while preserving retention, establishing a novel theoretically grounded and practically scalable framework for unlearning in LLMs.

Related papers

GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning [55.03441672267886]
We propose GradAlign, a gradient-aligned data selection method for reinforcement learning.<n>We evaluate GradAlign across three data regimes: unreliable reward signals, distribution imbalance, and low-utility training corpus.
arXiv Detail & Related papers (2026-02-25T01:54:50Z)
CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment [14.853204323785334]
Existing approaches, rooted in Gradient Ascent (GA), often degrade general domain knowledge while relying on retention data or curated contrastive pairs.<n>We develop a principled method that rescales unlearning effects in proportion to the model's token-level confidence.<n>Our work enables effective unlearning without requiring retention data or contrastive unlearning response pairs.
arXiv Detail & Related papers (2026-02-02T21:23:54Z)
Ghosting Your LLM: Without The Knowledge of Your Gradient and Data [23.256253959062544]
We focus on the impact of Bit Flip Attacks on large language models (LLMs)<n>BFAs exploits hardware faults to corrupt model parameters, posing a significant threat to model integrity and performance.<n>We propose novel vulnerability index metrics that can identify vulnerable weight bits in LLMs independent of any gradient or data knowledge.
arXiv Detail & Related papers (2025-11-27T18:52:51Z)
MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics [72.00014675808228]
Instability in Large Language Models evaluation process obscures true learning dynamics.<n>We introduce textbfMaP, a framework that integrates underlineMerging underlineand the underlinePass@k metric.<n>Experiments show that MaP yields significantly smoother performance curves, reduces inter-run variance, and ensures more consistent rankings.
arXiv Detail & Related papers (2025-10-10T11:40:27Z)
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z)
Non-Linear Trajectory Modeling for Multi-Step Gradient Inversion Attacks in Federated Learning [16.19043018432204]
We propose Non-Linear Surrogate Model Extension (NL-SME), the first method to introduce nonlinear parametric trajectory modeling for Gradient Inversion Attacks (GIAs)<n>Our approach replaces linear pose with learnable quadratic B'ezier curves that capture SGD's curved characteristics through control points, combined with regularization and dvec scaling mechanisms for enhanced expressiveness.
arXiv Detail & Related papers (2025-09-26T09:04:25Z)
LLM Unlearning using Gradient Ratio-Based Influence Estimation and Noise Injection [0.0]
Existing empirical methods often yield incomplete forgetting or unintended degradation of unrelated knowledge due to poor localization.<n>GRIN introduces a novel gradient-ratio-based metric to identify parameters most responsible for memorizing forget data.<n>We then perform selective noise injection into these parameters prior to fine-tuning, which improves unlearning performance while maintaining model utility.
arXiv Detail & Related papers (2025-08-08T17:15:32Z)
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models [7.566515311806724]
Large Language Models (LLMs) deployed in real-world settings increasingly face the need to unlearn sensitive, outdated, or proprietary information.<n>Existing unlearning methods formulate forgetting and retention as a regularized trade-off, combining both objectives into a single scalarized loss.<n>We propose a new formulation of LLM unlearning as a constrained optimization problem: forgetting is enforced via a novel logit-margin flattening loss.
arXiv Detail & Related papers (2025-06-05T17:55:23Z)
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models [21.16132396642158]
Training stability is a persistent challenge in the pre-training of large language models (LLMs)<n>We propose Scale-Distribution Decoupling (SDD), a novel approach that stabilizes training by explicitly decoupling the scale and distribution of the weight matrix in fully-connected layers.
arXiv Detail & Related papers (2025-02-21T14:49:34Z)
Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.<n>Existing update gradient would heavily destroy the performance on previous datasets during CIT process.<n>We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.