SSSE: Efficiently Erasing Samples from Trained Machine Learning Models
- URL: http://arxiv.org/abs/2107.03860v1
- Date: Thu, 8 Jul 2021 14:17:24 GMT
- Title: SSSE: Efficiently Erasing Samples from Trained Machine Learning Models
- Authors: Alexandra Peste, Dan Alistarh, Christoph H. Lampert
- Abstract summary: We propose an efficient and effective algorithm, SSSE, for samples erasure.
In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
- Score: 103.43466657962242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The availability of large amounts of user-provided data has been key to the
success of machine learning for many real-world tasks. Recently, an increasing
awareness has emerged that users should be given more control about how their
data is used. In particular, users should have the right to prohibit the use of
their data for training machine learning systems, and to have it erased from
already trained systems. While several sample erasure methods have been
proposed, all of them have drawbacks which have prevented them from gaining
widespread adoption. Most methods are either only applicable to very specific
families of models, sacrifice too much of the original model's accuracy, or
they have prohibitive memory or computational requirements. In this paper, we
propose an efficient and effective algorithm, SSSE, for samples erasure, that
is applicable to a wide class of machine learning models. From a second-order
analysis of the model's loss landscape we derive a closed-form update step of
the model parameters that only requires access to the data to be erased, not to
the original training set. Experiments on three datasets, CelebFaces attributes
(CelebA), Animals with Attributes 2 (AwA2) and CIFAR10, show that in certain
cases SSSE can erase samples almost as well as the optimal, yet impractical,
gold standard of training a new model from scratch with only the permitted
data.
Related papers
- Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Zero-Shot Machine Unlearning [6.884272840652062]
Modern privacy regulations grant citizens the right to be forgotten by products, services and companies.
No data related to the training process or training samples may be accessible for the unlearning purpose.
We propose two novel solutions for zero-shot machine unlearning based on (a) error minimizing-maximizing noise and (b) gated knowledge transfer.
arXiv Detail & Related papers (2022-01-14T19:16:09Z) - Complementary Ensemble Learning [1.90365714903665]
We derive a technique to improve performance of state-of-the-art deep learning models.
Specifically, we train auxiliary models which are able to complement state-of-the-art model uncertainty.
arXiv Detail & Related papers (2021-11-09T03:23:05Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Data Impressions: Mining Deep Models to Extract Samples for Data-free
Applications [26.48630545028405]
"Data Impressions" act as proxy to the training data and can be used to realize a variety of tasks.
We show the applicability of data impressions in solving several computer vision tasks.
arXiv Detail & Related papers (2021-01-15T11:37:29Z) - An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation [1.433758865948252]
We propose a new formalism of knowledge distillation for regression problems.
First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions.
By considering the multi-task network, training of the feature extraction of student models becomes more effective.
arXiv Detail & Related papers (2020-02-28T08:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.