Membership Inference Attacks From First Principles
- URL: http://arxiv.org/abs/2112.03570v1
- Date: Tue, 7 Dec 2021 08:47:00 GMT
- Title: Membership Inference Attacks From First Principles
- Authors: Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas
Terzis, Florian Tramer
- Abstract summary: A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset.
These attacks are currently evaluated using average-case "accuracy" metrics that fail to characterize whether the attack can confidently identify any members of the training set.
We argue that attacks should instead be evaluated by computing their true-positive rate at low false-positive rates, and find most prior attacks perform poorly when evaluated in this way.
Our attack is 10x more powerful at low false-positive rates, and also strictly dominates prior attacks on existing metrics.
- Score: 24.10746844866869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A membership inference attack allows an adversary to query a trained machine
learning model to predict whether or not a particular example was contained in
the model's training dataset. These attacks are currently evaluated using
average-case "accuracy" metrics that fail to characterize whether the attack
can confidently identify any members of the training set. We argue that attacks
should instead be evaluated by computing their true-positive rate at low (e.g.,
<0.1%) false-positive rates, and find most prior attacks perform poorly when
evaluated in this way. To address this we develop a Likelihood Ratio Attack
(LiRA) that carefully combines multiple ideas from the literature. Our attack
is 10x more powerful at low false-positive rates, and also strictly dominates
prior attacks on existing metrics.
Related papers
- Membership Inference Attacks against Language Models via Neighbourhood
Comparison [45.086816556309266]
Membership Inference attacks (MIAs) aim to predict whether a data sample was present in the training data of a machine learning model or not.
Recent work has demonstrated that reference-based attacks which compare model scores to those obtained from a reference model trained on similar data can substantially improve the performance of MIAs.
We investigate their performance in more realistic scenarios and find that they are highly fragile in relation to the data distribution used to train reference models.
arXiv Detail & Related papers (2023-05-29T07:06:03Z) - Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning
Few-Shot Meta-Learners [28.468089304148453]
We attack amortized meta-learners, which allows us to craft colluding sets of inputs that fool the system's learning algorithm.
We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance.
We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred.
arXiv Detail & Related papers (2022-11-23T14:55:44Z) - Towards Fair Classification against Poisoning Attacks [52.57443558122475]
We study the poisoning scenario where the attacker can insert a small fraction of samples into training data.
We propose a general and theoretically guaranteed framework which accommodates traditional defense methods to fair classification against poisoning attacks.
arXiv Detail & Related papers (2022-10-18T00:49:58Z) - Membership Inference Attacks by Exploiting Loss Trajectory [19.900473800648243]
We propose a new attack method, called system, which can exploit the membership information from the whole training process of the target model.
Our attack achieves at least 6$times$ higher true-positive rate at a low false-positive rate of 0.1% than existing methods.
arXiv Detail & Related papers (2022-08-31T16:02:26Z) - Identifying a Training-Set Attack's Target Using Renormalized Influence
Estimation [11.663072799764542]
This work proposes the task of target identification, which determines whether a specific test instance is the target of a training-set attack.
Rather than focusing on a single attack method or data modality, we build on influence estimation, which quantifies each training instance's contribution to a model's prediction.
arXiv Detail & Related papers (2022-01-25T02:36:34Z) - Membership Inference Attacks on Lottery Ticket Networks [6.1195233829355535]
We show that the lottery ticket networks are equally vulnerable to membership inference attacks.
Membership Inference Attacks could leak critical information about the training data that can be used for targeted attacks.
arXiv Detail & Related papers (2021-08-07T19:22:47Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Are Adversarial Examples Created Equal? A Learnable Weighted Minimax
Risk for Robustness under Non-uniform Attacks [70.11599738647963]
Adversarial Training is one of the few defenses that withstand strong attacks.
Traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution.
We present a weighted minimax risk optimization that defends against non-uniform attacks.
arXiv Detail & Related papers (2020-10-24T21:20:35Z) - Label-Only Membership Inference Attacks [67.46072950620247]
We introduce label-only membership inference attacks.
Our attacks evaluate the robustness of a model's predicted labels under perturbations.
We find that training models with differential privacy and (strong) L2 regularization are the only known defense strategies.
arXiv Detail & Related papers (2020-07-28T15:44:31Z) - Adversarial Example Games [51.92698856933169]
Adrial Example Games (AEG) is a framework that models the crafting of adversarial examples.
AEG provides a new way to design adversarial examples by adversarially training a generator and aversa from a given hypothesis class.
We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2020-07-01T19:47:23Z) - Revisiting Membership Inference Under Realistic Assumptions [87.13552321332988]
We study membership inference in settings where some of the assumptions typically used in previous research are relaxed.
This setting is more realistic than the balanced prior setting typically considered by researchers.
We develop a new inference attack based on the intuition that inputs corresponding to training set members will be near a local minimum in the loss function.
arXiv Detail & Related papers (2020-05-21T20:17:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.