Certified Data Removal from Machine Learning Models
- URL: http://arxiv.org/abs/1911.03030v6
- Date: Wed, 8 Nov 2023 03:57:25 GMT
- Title: Certified Data Removal from Machine Learning Models
- Authors: Chuan Guo, Tom Goldstein, Awni Hannun, Laurens van der Maaten
- Abstract summary: Good data stewardship requires removal of data at the request of the data's owner.
This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request.
We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with.
- Score: 79.91502073022602
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Good data stewardship requires removal of data at the request of the data's
owner. This raises the question if and how a trained machine-learning model,
which implicitly stores information about its training data, should be affected
by such a removal request. Is it possible to "remove" data from a
machine-learning model? We study this problem by defining certified removal: a
very strong theoretical guarantee that a model from which data is removed
cannot be distinguished from a model that never observed the data to begin
with. We develop a certified-removal mechanism for linear classifiers and
empirically study learning settings in which this mechanism is practical.
Related papers
- Towards Aligned Data Removal via Twin Machine Unlearning [30.070660418732807]
Modern privacy regulations have spurred the evolution of machine unlearning.
We present a Twin Machine Unlearning (TMU) approach, where a twin unlearning problem is defined corresponding to the original unlearning problem.
Our approach significantly enhances the alignment between the unlearned model and the gold model.
arXiv Detail & Related papers (2024-08-21T08:42:21Z) - Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning [28.35038726318893]
Machine learning models trained on vast amounts of real or synthetic data often achieve outstanding predictive performance across various domains.
To address privacy concerns, machine unlearning has been proposed to erase specific data samples from models.
We introduce the Unlearning Usability Attack to distill data distribution information into a small set of benign data.
arXiv Detail & Related papers (2024-07-06T15:42:28Z) - UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI [50.61495097098296]
We revisit the paradigm in which unlearning is used for Large Language Models (LLMs)
We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context.
We argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation.
arXiv Detail & Related papers (2024-06-27T10:24:35Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Machine Unlearning Methodology base on Stochastic Teacher Network [33.763901254862766]
"Right to be forgotten" grants data owners the right to actively withdraw data that has been used for model training.
Existing machine unlearning methods have been found to be ineffective in quickly removing knowledge from deep learning models.
This paper proposes using a network as a teacher to expedite the mitigation of the influence caused by forgotten data on the model.
arXiv Detail & Related papers (2023-08-28T06:05:23Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Zero-Shot Machine Unlearning [6.884272840652062]
Modern privacy regulations grant citizens the right to be forgotten by products, services and companies.
No data related to the training process or training samples may be accessible for the unlearning purpose.
We propose two novel solutions for zero-shot machine unlearning based on (a) error minimizing-maximizing noise and (b) gated knowledge transfer.
arXiv Detail & Related papers (2022-01-14T19:16:09Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - Amnesiac Machine Learning [15.680008735220785]
Recently enacted General Data Protection Regulation affects any data holder that has data on European Union residents.
Models are vulnerable to information leaking attacks such as model inversion attacks.
We present two data removal methods, namely Unlearning and Amnesiac Unlearning, that enable model owners to protect themselves against such attacks while being compliant with regulations.
arXiv Detail & Related papers (2020-10-21T13:14:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.