Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs
to be Forgotten
- URL: http://arxiv.org/abs/2202.13585v1
- Date: Mon, 28 Feb 2022 07:14:34 GMT
- Title: Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs
to be Forgotten
- Authors: Quoc Phong Nguyen, Ryutaro Oikawa, Dinil Mon Divakaran, Mun Choon
Chan, Bryan Kian Hsiang Low
- Abstract summary: This paper presents a Markov chain Monte Carlo-based machine unlearning (MCU) algorithm.
MCU helps to effectively and efficiently unlearn a trained model from subsets of training dataset.
We empirically evaluate the performance of our proposed MCU algorithm on real-world phishing and diabetes datasets.
- Score: 31.624662214658446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the use of machine learning (ML) models is becoming increasingly popular
in many real-world applications, there are practical challenges that need to be
addressed for model maintenance. One such challenge is to 'undo' the effect of
a specific subset of dataset used for training a model. This specific subset
may contain malicious or adversarial data injected by an attacker, which
affects the model performance. Another reason may be the need for a service
provider to remove data pertaining to a specific user to respect the user's
privacy. In both cases, the problem is to 'unlearn' a specific subset of the
training data from a trained model without incurring the costly procedure of
retraining the whole model from scratch. Towards this goal, this paper presents
a Markov chain Monte Carlo-based machine unlearning (MCU) algorithm. MCU helps
to effectively and efficiently unlearn a trained model from subsets of training
dataset. Furthermore, we show that with MCU, we are able to explain the effect
of a subset of a training dataset on the model prediction. Thus, MCU is useful
for examining subsets of data to identify the adversarial data to be removed.
Similarly, MCU can be used to erase the lineage of a user's personal data from
trained ML models, thus upholding a user's "right to be forgotten". We
empirically evaluate the performance of our proposed MCU algorithm on
real-world phishing and diabetes datasets. Results show that MCU can achieve a
desirable performance by efficiently removing the effect of a subset of
training dataset and outperform an existing algorithm that utilizes the
remaining dataset.
Related papers
- Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z) - Forget-MI: Machine Unlearning for Forgetting Multimodal Information in Healthcare Settings [5.200386658850142]
Forget-MI is a novel machine unlearning method for multimodal medical data.<n>We evaluate our results using performance on the forget dataset, performance on the test dataset, and Membership Inference Attack (MIA)<n>Our approach reduces MIA by 0.202 and decreases AUC and F1 scores on the forget set by 0.221 and 0.305, respectively.
arXiv Detail & Related papers (2025-06-29T08:53:23Z) - DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets.
Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining.
Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z) - Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - Partially Blinded Unlearning: Class Unlearning for Deep Networks a Bayesian Perspective [4.31734012105466]
Machine Unlearning is the process of selectively discarding information designated to specific sets or classes of data from a pre-trained model.
We propose a methodology tailored for the purposeful elimination of information linked to a specific class of data from a pre-trained classification network.
Our novel approach, termed textbfPartially-Blinded Unlearning (PBU), surpasses existing state-of-the-art class unlearning methods, demonstrating superior effectiveness.
arXiv Detail & Related papers (2024-03-24T17:33:22Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Machine Unlearning for Causal Inference [0.6621714555125157]
It is important to enable the model to forget some of its learning/captured information about a given user (machine unlearning)
This paper introduces the concept of machine unlearning for causal inference, particularly propensity score matching and treatment effect estimation.
The dataset used in the study is the Lalonde dataset, a widely used dataset for evaluating the effectiveness of job training programs.
arXiv Detail & Related papers (2023-08-24T17:27:01Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Learning to be a Statistician: Learned Estimator for Number of Distinct
Values [54.629042119819744]
Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems.
In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples.
We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator.
arXiv Detail & Related papers (2022-02-06T15:42:04Z) - Zero-Shot Machine Unlearning [6.884272840652062]
Modern privacy regulations grant citizens the right to be forgotten by products, services and companies.
No data related to the training process or training samples may be accessible for the unlearning purpose.
We propose two novel solutions for zero-shot machine unlearning based on (a) error minimizing-maximizing noise and (b) gated knowledge transfer.
arXiv Detail & Related papers (2022-01-14T19:16:09Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.