Remaining-data-free Machine Unlearning by Suppressing Sample Contribution
- URL: http://arxiv.org/abs/2402.15109v3
- Date: Fri, 06 Dec 2024 15:06:19 GMT
- Title: Remaining-data-free Machine Unlearning by Suppressing Sample Contribution
- Authors: Xinwen Cheng, Zhehao Huang, Wenxin Zhou, Zhengbao He, Ruikai Yang, Yingwen Wu, Xiaolin Huang,
- Abstract summary: Un unlearned model should approach the retrained model, where the forgetting data are not involved in the training process and hence do not contribute to the retrained model.<n>We propose MU-Mis (Machine Unlearning by Minimizing input sensitivity) to suppress the contribution of the forgetting data.<n>It is the first time that a remaining-data-free method can outperform state-of-the-art unlearning methods that utilize the remaining data.
- Score: 22.30844094734722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine unlearning (MU) is to forget data from a well-trained model, which is practically important due to the ``right to be forgotten''. The unlearned model should approach the retrained model, where the forgetting data are not involved in the training process and hence do not contribute to the retrained model. Considering the forgetting data's absence during retraining, we think unlearning should withdraw their contribution from the pre-trained model. The challenge is that when tracing the learning process is impractical, how to quantify and detach sample's contribution to the dynamic learning process using only the pre-trained model. We first theoretically discover that sample's contribution during the process will reflect in the learned model's sensitivity to it. We then practically design a novel method, namely MU-Mis (Machine Unlearning by Minimizing input sensitivity), to suppress the contribution of the forgetting data. Experimental results demonstrate that MU-Mis can unlearn effectively and efficiently without utilizing the remaining data. It is the first time that a remaining-data-free method can outperform state-of-the-art (SoTA) unlearning methods that utilize the remaining data.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Deep Unlearn: Benchmarking Machine Unlearning [7.450700594277741]
Machine unlearning (MU) aims to remove the influence of particular data points from the learnable parameters of a trained machine learning model.
This paper investigates 18 state-of-the-art MU methods across various benchmark datasets and models.
arXiv Detail & Related papers (2024-10-02T06:41:58Z) - Label Smoothing Improves Machine Unlearning [29.611981055071197]
This work introduces UGradSL, a simple, plug-and-play machine unlearning approach that uses smoothed labels.
The consistent improvement in MU performance is only at a marginal cost of additional computations.
arXiv Detail & Related papers (2024-06-11T20:26:26Z) - Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning [9.998859702421417]
Machine unlearning (MU) aims to eliminate the influence of chosen data points on model performance.
Despite various MU methods for data influence erasure, evaluations have largely focused on random data forgetting.
We propose identifying the data subset that presents the most significant challenge for influence erasure, pinpointing the worst-case forget set.
arXiv Detail & Related papers (2024-03-12T06:50:32Z) - An Information Theoretic Approach to Machine Unlearning [43.423418819707784]
To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important.
In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten.
We derive a simple but principled zero-shot unlearning method based on the geometry of the model.
arXiv Detail & Related papers (2024-02-02T13:33:30Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation [30.168665935074166]
We introduce the concept of 'weight saliency' for machine unlearning, drawing parallels with input saliency in model explanation.
The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning.
SalUn is the first principled MU approach that can effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation tasks.
arXiv Detail & Related papers (2023-10-19T06:17:17Z) - Machine Unlearning Methodology base on Stochastic Teacher Network [33.763901254862766]
"Right to be forgotten" grants data owners the right to actively withdraw data that has been used for model training.
Existing machine unlearning methods have been found to be ineffective in quickly removing knowledge from deep learning models.
This paper proposes using a network as a teacher to expedite the mitigation of the influence caused by forgotten data on the model.
arXiv Detail & Related papers (2023-08-28T06:05:23Z) - Machine Learning Force Fields with Data Cost Aware Training [94.78998399180519]
Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation.
Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels.
We propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data.
arXiv Detail & Related papers (2023-06-05T04:34:54Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Machine Unlearning Method Based On Projection Residual [23.24026891609028]
This paper adopts the projection residual method based on Newton method.
The main purpose is to implement machine unlearning tasks in the context of linear regression models and neural network models.
Experiments show that this method is more thorough in deleting data, which is close to model retraining.
arXiv Detail & Related papers (2022-09-30T07:29:55Z) - Boosting Facial Expression Recognition by A Semi-Supervised Progressive
Teacher [54.50747989860957]
We propose a semi-supervised learning algorithm named Progressive Teacher (PT) to utilize reliable FER datasets as well as large-scale unlabeled expression images for effective training.
Experiments on widely-used databases RAF-DB and FERPlus validate the effectiveness of our method, which achieves state-of-the-art performance with accuracy of 89.57% on RAF-DB.
arXiv Detail & Related papers (2022-05-28T07:47:53Z) - Zero-Shot Machine Unlearning [6.884272840652062]
Modern privacy regulations grant citizens the right to be forgotten by products, services and companies.
No data related to the training process or training samples may be accessible for the unlearning purpose.
We propose two novel solutions for zero-shot machine unlearning based on (a) error minimizing-maximizing noise and (b) gated knowledge transfer.
arXiv Detail & Related papers (2022-01-14T19:16:09Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure.
In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z) - Variational Bayesian Unlearning [54.26984662139516]
We study the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased.
We show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief.
In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging.
arXiv Detail & Related papers (2020-10-24T11:53:00Z) - How Does Data Augmentation Affect Privacy in Machine Learning? [94.52721115660626]
We propose new MI attacks to utilize the information of augmented data.
We establish the optimal membership inference when the model is trained with augmented data.
arXiv Detail & Related papers (2020-07-21T02:21:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.