Why Some Models Resist Unlearning: A Linear Stability Perspective
- URL: http://arxiv.org/abs/2602.02986v1
- Date: Tue, 03 Feb 2026 01:47:26 GMT
- Title: Why Some Models Resist Unlearning: A Linear Stability Perspective
- Authors: Wei-Kai Chang, Rajiv Khanna,
- Abstract summary: We frame unlearning through the lens of linear coherence stability.<n>We decompose coherence along three axes: within the retain set, within the forget set, and between them.<n>To further link data properties to forgettability, we study a two layer ReLU CNN under a signal plus noise model.<n>For empirical geometry, we show that Hessian tests and CNN heatmaps align closely with the predicted boundary, mapping the stability gradient based unlearning as a function of verification, mixing, and data/model alignment.
- Score: 7.446140380340418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine unlearning, the ability to erase the effect of specific training samples without retraining from scratch, is critical for privacy, regulation, and efficiency. However, most progress in unlearning has been empirical, with little theoretical understanding of when and why unlearning works. We tackle this gap by framing unlearning through the lens of asymptotic linear stability to capture the interaction between optimization dynamics and data geometry. The key quantity in our analysis is data coherence which is the cross sample alignment of loss surface directions near the optimum. We decompose coherence along three axes: within the retain set, within the forget set, and between them, and prove tight stability thresholds that separate convergence from divergence. To further link data properties to forgettability, we study a two layer ReLU CNN under a signal plus noise model and show that stronger memorization makes forgetting easier: when the signal to noise ratio (SNR) is lower, cross sample alignment is weaker, reducing coherence and making unlearning easier; conversely, high SNR, highly aligned models resist unlearning. For empirical verification, we show that Hessian tests and CNN heatmaps align closely with the predicted boundary, mapping the stability frontier of gradient based unlearning as a function of batching, mixing, and data/model alignment. Our analysis is grounded in random matrix theory tools and provides the first principled account of the trade offs between memorization, coherence, and unlearning.
Related papers
- Advancing Analytic Class-Incremental Learning through Vision-Language Calibration [6.871141687303144]
Class-incremental learning (CIL) with pre-trained models (PTMs) faces a critical trade-off between efficient adaptation and long-term stability.<n>We propose textbfVILA, a novel dual-branch framework that advances analytic CIL via a two-level vision-language calibration strategy.<n>Our framework harmonizes high-fidelity prediction with the simplicity of analytic learning.
arXiv Detail & Related papers (2026-02-14T08:32:51Z) - Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning [23.616336786063552]
Flow matching has emerged as a powerful framework for generative modeling.<n>We identify a latent structural mismatch that arises when it is coupled with velocity-based objectives.<n>We prove that re-aligning the objective to the signal space eliminates the singular weighting.
arXiv Detail & Related papers (2026-02-11T02:02:30Z) - Robustness of Probabilistic Models to Low-Quality Data: A Multi-Perspective Analysis [23.834741751854448]
A systematic, comparative investigation into the effects of low-quality data reveals a stark spectrum of robustness across modern probabilistic models.<n>We find that autoregressive language models, from token prediction to sequence-to-sequence tasks, are remarkably resilient.<n>Under the same levels of data corruption, class-conditional diffusion models degrade catastrophically.
arXiv Detail & Related papers (2025-12-11T02:10:41Z) - Phase transitions reveal hierarchical structure in deep neural networks [0.0]
We show that phase transitions in Deep Neural Networks are governed by saddle points in the loss landscape.<n>We introduce a simple, fast, and easy to implement algorithm that uses the L2 regularizer as a tool to probe the geometry of error landscapes.
arXiv Detail & Related papers (2025-12-05T15:14:09Z) - A Unified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias [7.446140380340418]
gradient descent (SGD) and its variants reliably find solutions that generalize well, but the mechanisms driving this generalization remain unclear.<n>We develop a linear stability framework that analyzes the behavior of SGD, random perturbations, and SAM, particularly in two layer ReLU networks.<n>Central to our analysis is a coherence measure that quantifies how gradient curvature aligns across data points, revealing why certain minima are stable and favored during training.
arXiv Detail & Related papers (2025-11-21T16:41:14Z) - MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics [72.00014675808228]
Instability in Large Language Models evaluation process obscures true learning dynamics.<n>We introduce textbfMaP, a framework that integrates underlineMerging underlineand the underlinePass@k metric.<n>Experiments show that MaP yields significantly smoother performance curves, reduces inter-run variance, and ensures more consistent rankings.
arXiv Detail & Related papers (2025-10-10T11:40:27Z) - Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules [9.332823269318842]
Scaling laws have emerged as a unifying lens for understanding and guiding the training of large language models.<n>We establish a Functional Scaling Law that captures the full loss trajectory under arbitrary LRSs.<n>We derive explicit scaling relations in both data- and compute-limited regimes.
arXiv Detail & Related papers (2025-09-23T16:05:16Z) - The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions [51.68215326304272]
We show that even small perturbations reliably cause otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time.<n>Our findings provide insights into neural network training stability, with practical implications for fine-tuning, model merging, and diversity of model ensembles.
arXiv Detail & Related papers (2025-06-16T08:35:16Z) - In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data.<n>Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z) - Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit [1.7597525104451157]
An empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE)
Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations (ODEs)
We analyze the fixed point locations and their stability of the ODEs unveiling several interesting findings.
arXiv Detail & Related papers (2024-06-11T03:07:41Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies.
We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.