Variance-Reduced $(\varepsilon,δ)-$Unlearning using Forget Set Gradients
- URL: http://arxiv.org/abs/2602.14938v1
- Date: Mon, 16 Feb 2026 17:20:14 GMT
- Title: Variance-Reduced $(\varepsilon,δ)-$Unlearning using Forget Set Gradients
- Authors: Martin Van Waerebeke, Marco Lorenzi, Kevin Scaman, El Mahdi El Mhamdi, Giovanni Neglia,
- Abstract summary: Variance-Reduced Unlearning is a first-order algorithm that directly includes forget set gradients in its update rule.<n>We show that incorporating the forget set yields strictly improved rates, i.e. a better dependence on the achieved error.
- Score: 21.428036263207243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In machine unlearning, $(\varepsilon,δ)-$unlearning is a popular framework that provides formal guarantees on the effectiveness of the removal of a subset of training data, the forget set, from a trained model. For strongly convex objectives, existing first-order methods achieve $(\varepsilon,δ)-$unlearning, but they only use the forget set to calibrate injected noise, never as a direct optimization signal. In contrast, efficient empirical heuristics often exploit the forget samples (e.g., via gradient ascent) but come with no formal unlearning guarantees. We bridge this gap by presenting the Variance-Reduced Unlearning (VRU) algorithm. To the best of our knowledge, VRU is the first first-order algorithm that directly includes forget set gradients in its update rule, while provably satisfying ($(\varepsilon,δ)-$unlearning. We establish the convergence of VRU and show that incorporating the forget set yields strictly improved rates, i.e. a better dependence on the achieved error compared to existing first-order $(\varepsilon,δ)-$unlearning methods. Moreover, we prove that, in a low-error regime, VRU asymptotically outperforms any first-order method that ignores the forget set.Experiments corroborate our theory, showing consistent gains over both state-of-the-art certified unlearning methods and over empirical baselines that explicitly leverage the forget set.
Related papers
- Is Gradient Ascent Really Necessary? Memorize to Forget for Machine Unlearning [71.96329385684395]
We propose model extrapolation as an alternative to gradient ascent (GA)<n>Counterfactual as it might sound, a forget model can be obtained via extrapolation from the memorization model to the reference model.<n>Our model extrapolation is simple and efficient to implement, and it can also effectively converge throughout training to achieve improved unlearning performance.
arXiv Detail & Related papers (2026-02-06T07:11:27Z) - Grokked Models are Better Unlearners [5.8757712547216485]
Starting from grokked checkpoints consistently yields more efficient forgetting.<n>Post-grokking models learn more modular representations with reduced gradient alignment between forget and retain subsets.
arXiv Detail & Related papers (2025-12-03T04:35:49Z) - BLUR: A Bi-Level Optimization Approach for LLM Unlearning [100.90394814817965]
We argue that it is important to model the hierarchical structure of the unlearning problem.<n>We propose a novel algorithm, termed Bi-Level UnleaRning (textttBLUR), which delivers superior performance.
arXiv Detail & Related papers (2025-06-09T19:23:05Z) - Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions [22.183624306817563]
Machine unlearning algorithms aim to efficiently data from a model without it from scratch.<n> certified machine unlearning is a strong theoretical guarantee based on differential generalization.
arXiv Detail & Related papers (2024-09-15T15:58:08Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Is Retain Set All You Need in Machine Unlearning? Restoring Performance of Unlearned Models with Out-Of-Distribution Images [0.0]
We introduce Selective-distillation for Class and Architecture-agnostic unleaRning (SCAR)
SCAR efficiently eliminates specific information while preserving the model's test accuracy without using a retain set.
We experimentally verified the effectiveness of our method, on three public datasets.
arXiv Detail & Related papers (2024-04-19T14:45:27Z) - Achieving Constant Regret in Linear Markov Decision Processes [57.34287648914407]
We introduce an algorithm, Cert-LSVI-UCB, for misspecified linear Markov decision processes (MDPs)<n>We show that Cert-LSVI-UCB has a cumulative regret of $tildemathcalO(d3H5/Delta)$ with high probability, provided that the misspecification level $zeta$ is below $tildemathcalO(Delta / (sqrtdH2))$.
arXiv Detail & Related papers (2024-04-16T17:23:19Z) - Hessian-Free Online Certified Unlearning [8.875278412741695]
We develop an online unlearning algorithm that achieves near-instantaneous data removal.<n>We prove that our proposed method outperforms the state-of-the-art methods in terms of the unlearning and generalization guarantees.
arXiv Detail & Related papers (2024-04-02T07:54:18Z) - Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z) - spred: Solving $L_1$ Penalty with SGD [6.2255027793924285]
We propose to minimize a differentiable objective with $L_$ using a simple reparametrization.
We prove that the reparametrization trick is completely benign" with an exactiable non function.
arXiv Detail & Related papers (2022-10-03T20:07:51Z) - Balanced Self-Paced Learning for AUC Maximization [88.53174245457268]
Existing self-paced methods are limited to pointwise AUC.
Our algorithm converges to a stationary point on the basis of closed-form solutions.
arXiv Detail & Related papers (2022-07-08T02:09:32Z) - Towards Demystifying Representation Learning with Non-contrastive
Self-supervision [82.80118139087676]
Non-contrastive methods of self-supervised learning learn representations by minimizing the distance between two views of the same image.
Tian el al. (2021) made an initial attempt on the first question and proposed DirectPred that sets the predictor directly.
We show that in a simple linear network, DirectSet($alpha$) provably learns a desirable projection matrix and also reduces the sample complexity on downstream tasks.
arXiv Detail & Related papers (2021-10-11T00:48:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.