Related papers: Geometric-Disentangelment Unlearning

Geometric-Disentangelment Unlearning

URL: http://arxiv.org/abs/2511.17100v1
Date: Fri, 21 Nov 2025 09:58:25 GMT
Title: Geometric-Disentangelment Unlearning
Authors: Duo Zhou, Yuji Zhang, Tianxin Wei, Ruizhong Qiu, Ke Yang, Xiao Lin, Cheng Qian, Jingrui He, Hanghang Tong, Heng Ji, Huan Zhang,
Abstract summary: gradient ascent on forget samples often harms retained knowledge.<n>We propose the Geometric-disment Unlearning (GU) that decomposes any candidate forget gradient update into tangential and normal components to retain space and executes only the normal component.<n>Our method is plug-and-play and can be attached to existing gradient-based unlearning procedures to mitigate side effects.
Score: 106.99160454669902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine unlearning, the removal of a training subset's influence from a deployed model, is critical for privacy preservation and model reliability, yet gradient ascent on forget samples often harms retained knowledge. Existing approaches face a persistent tradeoff between effective forgetting and preservation on the retain set. While previous methods provide useful heuristics, they often lack a formal analysis on how exactly forgetting updates harm retained knowledge, and whether the side effects can be removed with theoretical guarantees. To explore a theoretically sound and simple solution, we start from the first principle on how performance on the retain set is actually affected: a first-order analysis of the local change of the retain loss under small parameter updates during model training. We start from a crisp equivalence: the retain loss is unchanged to first order iff the update direction is orthogonal to the subspace spanned by retain gradients ("retain-invariant"). This identifies the entangled component as the tangential part of forget update within the retain-gradient subspace, and characterizes disentanglement as orthogonality. Guided by this, we propose the Geometric-disentanglement Unlearning (GU) that decomposes any candidate forget gradient update into tangential and normal components to retain space and executes only the normal component. Under a standard trust-region budget, the projected direction aligned with the raw forget gradient is optimal among all first-order retain-invariant moves, and we also derive the optimal projected direction for joint forget-retain updating objectives. Our method is plug-and-play and can be attached to existing gradient-based unlearning procedures to mitigate side effects. GU achieves consistent improvement on various methods across three benchmarks TOFU, MUSE, and WMDP.

Related papers

Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection [52.551864761088574]
Large Language Models (LLMs) often incur an alignment tax: safety post-training can reduce general utility.<n>We argue that this tax primarily arises from continual-learning-style forgetting in sequential alignment.<n>We propose Orthogonal Gradient Projection for Safety Alignment (OGPSA) to balance plasticity and stability.
arXiv Detail & Related papers (2026-02-08T09:53:46Z)
Less is More: Clustered Cross-Covariance Control for Offline RL [13.198112768636207]
A fundamental challenge in offline reinforcement learning is distributional shift.<n>We propose partitioned buffer sampling that restricts updates to localized replay partitions.<n>We also introduce an explicit gradient-based corrective penalty that cancels the covariance induced bias within each update.
arXiv Detail & Related papers (2026-01-28T16:55:04Z)
FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning [24.195588298488314]
Existing methods suppress parameters or confuse knowledge without explicit constraints on both feature and gradient level.<n>We propose FG-OrIU (textbfFeaturetextbfGradient textbfOrthogonality for textbfIncrementaltextbfUnlearning)<n>It decomposes feature spaces via Singular Value Decomposition (SVD), separating forgetting and remaining class features into distinct subspaces.
arXiv Detail & Related papers (2026-01-20T04:05:13Z)
Unlearning at Scale: Implementing the Right to be Forgotten in Large Language Models [0.0]
Our approach treats as a minimal program and logs permicrobatch record.<n>Under pinned stack and deterministic kernels, replaying the training tail yields the same parameters as training retain set.
arXiv Detail & Related papers (2025-08-17T03:29:22Z)
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models [14.321060805197874]
Large Language Models (LLMs) deployed in real-world settings increasingly face the need to unlearn sensitive, outdated, or proprietary information.<n>Existing unlearning methods formulate forgetting and retention as a regularized trade-off, combining both objectives into a single scalarized loss.<n>We propose a new formulation of LLM unlearning as a constrained optimization problem: forgetting is enforced via a novel logit-margin flattening loss.
arXiv Detail & Related papers (2025-06-05T17:55:23Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release [29.765896801370612]
This paper proposes Differentially Private training framework based on Selective Updates and Release. The main challenges lie in two aspects -- privacy concerns, and gradient selection strategy for model update. Experiments conducted on MNIST, FMNIST, CIFAR-10, and IMDB datasets show that DPSUR significantly outperforms previous works in terms of convergence speed.
arXiv Detail & Related papers (2023-11-23T15:19:30Z)
GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy. Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge. We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z)
Large Scale Private Learning via Low-rank Reparametrization [77.38947817228656]
We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks. We are the first able to apply differential privacy on the BERT model and achieve an average accuracy of $83.9%$ on four downstream tasks.
arXiv Detail & Related papers (2021-06-17T10:14:43Z)
Correcting Momentum in Temporal Difference Learning [95.62766731469671]
We argue that momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale. We show that this phenomenon exists, and then propose a first-order correction term to momentum. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.
arXiv Detail & Related papers (2021-06-07T20:41:15Z)
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime. We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z)
Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.