Approximate Data Deletion from Machine Learning Models
- URL: http://arxiv.org/abs/2002.10077v2
- Date: Tue, 23 Feb 2021 18:56:03 GMT
- Title: Approximate Data Deletion from Machine Learning Models
- Authors: Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, James Zou
- Abstract summary: Deleting data from a trained machine learning (ML) model is a critical task in many applications.
We propose a new approximate deletion method for linear and logistic models.
We also develop a new feature-injection test to evaluate the thoroughness of data deletion from ML models.
- Score: 31.689174311625084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deleting data from a trained machine learning (ML) model is a critical task
in many applications. For example, we may want to remove the influence of
training points that might be out of date or outliers. Regulations such as EU's
General Data Protection Regulation also stipulate that individuals can request
to have their data deleted. The naive approach to data deletion is to retrain
the ML model on the remaining data, but this is too time consuming. In this
work, we propose a new approximate deletion method for linear and logistic
models whose computational cost is linear in the the feature dimension $d$ and
independent of the number of training data $n$. This is a significant gain over
all existing methods, which all have superlinear time dependence on the
dimension. We also develop a new feature-injection test to evaluate the
thoroughness of data deletion from ML models.
Related papers
- The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Machine Unlearning Method Based On Projection Residual [23.24026891609028]
This paper adopts the projection residual method based on Newton method.
The main purpose is to implement machine unlearning tasks in the context of linear regression models and neural network models.
Experiments show that this method is more thorough in deleting data, which is close to model retraining.
arXiv Detail & Related papers (2022-09-30T07:29:55Z) - Datamodels: Predicting Predictions from Training Data [86.66720175866415]
We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data.
We show that even simple linear datamodels can successfully predict model outputs.
arXiv Detail & Related papers (2022-02-01T18:15:24Z) - Zero-Shot Machine Unlearning [6.884272840652062]
Modern privacy regulations grant citizens the right to be forgotten by products, services and companies.
No data related to the training process or training samples may be accessible for the unlearning purpose.
We propose two novel solutions for zero-shot machine unlearning based on (a) error minimizing-maximizing noise and (b) gated knowledge transfer.
arXiv Detail & Related papers (2022-01-14T19:16:09Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - Certifiable Machine Unlearning for Linear Models [1.484852576248587]
Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted.
We present an experimental study of the three state-of-the-art approximate unlearning methods for linear models.
arXiv Detail & Related papers (2021-06-29T05:05:58Z) - Supervised Machine Learning with Plausible Deniability [1.685485565763117]
We study the question of how well machine learning (ML) models trained on a certain data set provide privacy for the training data.
We show that one can take a set of purely random training data, and from this define a suitable learning rule'' that will produce a ML model that is exactly $f$.
arXiv Detail & Related papers (2021-06-08T11:54:51Z) - Certified Data Removal from Machine Learning Models [79.91502073022602]
Good data stewardship requires removal of data at the request of the data's owner.
This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request.
We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with.
arXiv Detail & Related papers (2019-11-08T03:57:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.