Related papers: Ascent Fails to Forget

Ascent Fails to Forget

URL: http://arxiv.org/abs/2509.26427v2
Date: Fri, 17 Oct 2025 10:22:26 GMT
Title: Ascent Fails to Forget
Authors: Ioannis Mavrothalassitis, Pol Puigdemont, Noam Itzhak Levi, Volkan Cevher,
Abstract summary: We show that gradient ascent-based unconstrained optimization methods frequently fail to perform machine unlearning.<n>We attribute this phenomenon to the inherent statistical dependence between the forget and retain data sets.<n>Our findings highlight that the presence of such statistical dependencies, even when manifest only as correlations, can be sufficient for ascent-based unlearning to fail.
Score: 45.75497227694833
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contrary to common belief, we show that gradient ascent-based unconstrained optimization methods frequently fail to perform machine unlearning, a phenomenon we attribute to the inherent statistical dependence between the forget and retain data sets. This dependence, which can manifest itself even as simple correlations, undermines the misconception that these sets can be independently manipulated during unlearning. We provide empirical and theoretical evidence showing these methods often fail precisely due to this overlooked relationship. For random forget sets, this dependence means that degrading forget set metrics (which, for a retrained model, should mirror test set metrics) inevitably harms overall test performance. Going beyond random sets, we consider logistic regression as an instructive example where a critical failure mode emerges: inter-set dependence causes gradient descent-ascent iterations to progressively diverge from the ideal retrained model. Strikingly, these methods can converge to solutions that are not only far from the retrained ideal but are potentially even further from it than the original model itself, rendering the unlearning process actively detrimental. A toy example further illustrates how this dependence can trap models in inferior local minima, inescapable via finetuning. Our findings highlight that the presence of such statistical dependencies, even when manifest only as correlations, can be sufficient for ascent-based unlearning to fail. Our theoretical insights are corroborated by experiments on complex neural networks, demonstrating that these methods do not perform as expected in practice due to this unaddressed statistical interplay.

Related papers

On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis [0.01269104766024433]
We formalise self-training in Large Language Models (LLMs) and Generative AI as a discrete-time dynamical system.<n>We derive two fundamental failure modes: (1) Entropy Decay, where finite sampling effects cause a monotonic loss of distributional diversity (mode collapse), and (2) Variance Amplification, where the loss of external grounding causes the model's representation of truth to drift as a random walk.
arXiv Detail & Related papers (2026-01-05T19:50:49Z)
Reference-Specific Unlearning Metrics Can Hide the Truth: A Reality Check [60.77691669644931]
We propose Functional Alignment for Distributional Equivalence (FADE), a novel metric that measures distributional similarity between unlearned and reference models.<n>We show that FADE captures functional alignment across the entire output distribution, providing a principled assessment of genuine unlearning.<n>These findings expose fundamental gaps in current evaluation practices and demonstrate that FADE provides a more robust foundation for developing and assessing truly effective unlearning methods.
arXiv Detail & Related papers (2025-10-14T20:50:30Z)
Improving Group Robustness on Spurious Correlation via Evidential Alignment [26.544938760265136]
Deep neural networks often learn and rely on spurious correlations, i.e., superficial associations between non-causal features and the targets.<n>Existing methods typically mitigate this issue by using external group annotations or auxiliary deterministic models.<n>We propose Evidential Alignment, a novel framework that leverages uncertainty quantification to understand the behavior of the biased models.
arXiv Detail & Related papers (2025-06-12T22:47:21Z)
Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? [58.942118128503104]
Causal confusion is a phenomenon where an agent learns a policy that reflects imperfect spurious correlations in the data. This phenomenon is particularly pronounced in domains such as robotics. In this paper, we study causal confusion in offline reinforcement learning.
arXiv Detail & Related papers (2023-12-28T17:54:56Z)
Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation [57.351098530477124]
We consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders. A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one. Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge.
arXiv Detail & Related papers (2023-07-15T23:53:37Z)
Bias-inducing geometries: an exactly solvable data model with fairness implications [12.532003449620607]
We introduce an exactly solvable high-dimensional model of data imbalance.<n>We analytically unpack the typical properties of learning models trained in this synthetic framework.<n>We obtain exact predictions for the observables that are commonly employed for fairness assessment.
arXiv Detail & Related papers (2022-05-31T16:27:57Z)
Disentangling Observed Causal Effects from Latent Confounders using Method of Moments [67.27068846108047]
We provide guarantees on identifiability and learnability under mild assumptions. We develop efficient algorithms based on coupled tensor decomposition with linear constraints to obtain scalable and guaranteed solutions.
arXiv Detail & Related papers (2021-01-17T07:48:45Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning [29.473503894240096]
We focus on the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. We propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing.
arXiv Detail & Related papers (2020-11-10T16:44:35Z)
Understanding the Failure Modes of Out-of-Distribution Generalization [35.00563456450452]
Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time. In this work, we identify the fundamental factors that give rise to this behavior, by explaining why models fail this way em even in easy-to-learn tasks.
arXiv Detail & Related papers (2020-10-29T17:19:03Z)
Automatic Recall Machines: Internal Replay, Continual Learning and the Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity. We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective. Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.