Deletion Inference, Reconstruction, and Compliance in Machine
(Un)Learning
- URL: http://arxiv.org/abs/2202.03460v1
- Date: Mon, 7 Feb 2022 19:02:58 GMT
- Title: Deletion Inference, Reconstruction, and Compliance in Machine
(Un)Learning
- Authors: Ji Gao, Sanjam Garg, Mohammad Mahmoody, Prashant Nalini Vasudevan
- Abstract summary: Privacy attacks on machine learning models aim to identify the data that is used to train such models.
Many machine learning methods are recently extended to support machine unlearning.
- Score: 21.404426803200796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Privacy attacks on machine learning models aim to identify the data that is
used to train such models. Such attacks, traditionally, are studied on static
models that are trained once and are accessible by the adversary. Motivated to
meet new legal requirements, many machine learning methods are recently
extended to support machine unlearning, i.e., updating models as if certain
examples are removed from their training sets, and meet new legal requirements.
However, privacy attacks could potentially become more devastating in this new
setting, since an attacker could now access both the original model before
deletion and the new model after the deletion. In fact, the very act of
deletion might make the deleted record more vulnerable to privacy attacks.
Inspired by cryptographic definitions and the differential privacy framework,
we formally study privacy implications of machine unlearning. We formalize
(various forms of) deletion inference and deletion reconstruction attacks, in
which the adversary aims to either identify which record is deleted or to
reconstruct (perhaps part of) the deleted records. We then present successful
deletion inference and reconstruction attacks for a variety of machine learning
models and tasks such as classification, regression, and language models.
Finally, we show that our attacks would provably be precluded if the schemes
satisfy (variants of) Deletion Compliance (Garg, Goldwasser, and Vasudevan,
Eurocrypt' 20).
Related papers
- Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable [30.22146634953896]
We show how to mount a near-perfect attack on the deleted data point from linear regression models.
Our work highlights that privacy risk is significant even for extremely simple model classes when individuals can request deletion of their data from the model.
arXiv Detail & Related papers (2024-05-30T17:27:44Z) - Learn What You Want to Unlearn: Unlearning Inversion Attacks against Machine Unlearning [16.809644622465086]
We conduct the first investigation to understand the extent to which machine unlearning can leak the confidential content of unlearned data.
Under the Machine Learning as a Service setting, we propose unlearning inversion attacks that can reveal the feature and label information of an unlearned sample.
The experimental results indicate that the proposed attack can reveal the sensitive information of the unlearned data.
arXiv Detail & Related papers (2024-04-04T06:37:46Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Can Sensitive Information Be Deleted From LLMs? Objectives for Defending
Against Extraction Attacks [73.53327403684676]
We propose an attack-and-defense framework for studying the task of deleting sensitive information directly from model weights.
We study direct edits to model weights because this approach should guarantee that particular deleted information is never extracted by future prompt attacks.
We show that even state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J, as our whitebox and blackbox attacks can recover "deleted" information from an edited model 38% of the time.
arXiv Detail & Related papers (2023-09-29T17:12:43Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Reconstructing Training Data with Informed Adversaries [30.138217209991826]
Given access to a machine learning model, can an adversary reconstruct the model's training data?
This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one.
We show it is feasible to reconstruct the remaining data point in this stringent threat model.
arXiv Detail & Related papers (2022-01-13T09:19:25Z) - Hard to Forget: Poisoning Attacks on Certified Machine Unlearning [13.516740881682903]
We consider an attacker aiming to increase the computational cost of data removal.
We derive and empirically investigate a poisoning attack on certified machine unlearning.
arXiv Detail & Related papers (2021-09-17T01:00:46Z) - SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure.
In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z) - Amnesiac Machine Learning [15.680008735220785]
Recently enacted General Data Protection Regulation affects any data holder that has data on European Union residents.
Models are vulnerable to information leaking attacks such as model inversion attacks.
We present two data removal methods, namely Unlearning and Amnesiac Unlearning, that enable model owners to protect themselves against such attacks while being compliant with regulations.
arXiv Detail & Related papers (2020-10-21T13:14:17Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.