Related papers: The Utility and Complexity of in- and out-of-Distribution Machine Unlearning

The Utility and Complexity of in- and out-of-Distribution Machine Unlearning

URL: http://arxiv.org/abs/2412.09119v2
Date: Wed, 12 Feb 2025 09:38:31 GMT
Title: The Utility and Complexity of in- and out-of-Distribution Machine Unlearning
Authors: Youssef Allouah, Joshua Kazdan, Rachid Guerraoui, Sanmi Koyejo,
Abstract summary: We analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning.<n>We propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.
Score: 16.879887267565742
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.

Related papers

Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z)
Robust Molecular Property Prediction via Densifying Scarce Labeled Data [51.55434084913129]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel meta-learning-based approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.<n>We demonstrate significant performance gains on challenging real-world datasets.
arXiv Detail & Related papers (2025-06-13T15:27:40Z)
When to Forget? Complexity Trade-offs in Machine Unlearning [23.507879460531264]
Machine Unlearning (MU) aims at removing the influence of specific data points from a trained model. We analyze the efficiency of unlearning methods and establish the first upper and lower bounds on minimax times for this problem. We provide a phase diagram for the unlearning complexity ratio -- a novel metric that compares the computational cost of the best unlearning method to full model retraining.
arXiv Detail & Related papers (2025-02-24T16:56:27Z)
Adversarial Mixup Unlearning [16.89710766008491]
We introduce a novel approach that regularizes the unlearning process by utilizing synthesized mixup samples. At the core of our approach is a generator-unlearner framework, MixUnlearn. We show that our method significantly outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-14T16:50:33Z)
Machine Unlearning via Information Theoretic Regularization [3.05179671246628]
We introduce a mathematical framework based on information-theoretic regularization to address both feature and data point unlearning. By combining flexibility in learning objectives with simplicity in regularization design, our approach is highly adaptable and practical for a wide range of machine learning and AI applications.
arXiv Detail & Related papers (2025-02-08T20:33:06Z)
Distribution-Level Feature Distancing for Machine Unlearning: Towards a Better Trade-off Between Model Utility and Forgetting [4.220336689294245]
We propose Distribution-Level Feature Distancing (DLFD), a novel method that efficiently forgets instances while preserving task-relevant feature correlations.<n>Our method synthesizes data samples by optimizing the feature distribution to be distinctly different from that of forget samples, achieving effective results within a single training epoch.
arXiv Detail & Related papers (2024-09-23T06:51:10Z)
Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning [50.382793324572845]
Distributed computing involves communication between devices, which requires solving two key problems: efficiency and privacy. In this paper, we analyze a new method that incorporates the ideas of using data similarity and clients sampling. To address privacy concerns, we apply the technique of additional noise and analyze its impact on the convergence of the proposed method.
arXiv Detail & Related papers (2024-09-22T00:49:10Z)
Dataset Condensation Driven Machine Unlearning [0.0]
Current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning. We propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency. We present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks.
arXiv Detail & Related papers (2024-01-31T21:48:25Z)
Langevin Unlearning: A New Perspective of Noisy Gradient Descent for Machine Unlearning [20.546589699647416]
Privacy is defined as statistical indistinguishability to retraining from scratch. We propose Langevin unlearning, an unlearning framework based on a gradient descent.
arXiv Detail & Related papers (2024-01-18T20:35:47Z)
Heterogeneous Target Speech Separation [52.05046029743995]
We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts. Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts.
arXiv Detail & Related papers (2022-04-07T17:14:20Z)
Non-IID data and Continual Learning processes in Federated Learning: A long road ahead [58.720142291102135]
Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private. In this work, we formally classify data statistical heterogeneity and review the most remarkable learning strategies that are able to face it. At the same time, we introduce approaches from other machine learning frameworks, such as Continual Learning, that also deal with data heterogeneity and could be easily adapted to the Federated Learning settings.
arXiv Detail & Related papers (2021-11-26T09:57:11Z)
On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning [69.48387059607387]
We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning. We analyze the limitations of learning from confounded expert data with and without external reward. We validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
arXiv Detail & Related papers (2021-10-13T07:31:31Z)
Contrastive learning of strong-mixing continuous-time stochastic processes [53.82893653745542]
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data. We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case.
arXiv Detail & Related papers (2021-03-03T23:06:47Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.