Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets
- URL: http://arxiv.org/abs/2204.00032v1
- Date: Thu, 31 Mar 2022 18:06:28 GMT
- Title: Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets
- Authors: Florian Tram\`er and Reza Shokri and Ayrton San Joaquin and Hoang Le
and Matthew Jagielski and Sanghyun Hong and Nicholas Carlini
- Abstract summary: We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak private details belonging to other parties.
Our attacks are effective across membership inference, attribute inference, and data extraction.
Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty protocols for machine learning.
- Score: 53.866927712193416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new class of attacks on machine learning models. We show that
an adversary who can poison a training dataset can cause models trained on this
dataset to leak significant private details of training points belonging to
other parties. Our active inference attacks connect two independent lines of
work targeting the integrity and privacy of machine learning training data.
Our attacks are effective across membership inference, attribute inference,
and data extraction. For example, our targeted attacks can poison <0.1% of the
training dataset to boost the performance of inference attacks by 1 to 2 orders
of magnitude. Further, an adversary who controls a significant fraction of the
training data (e.g., 50%) can launch untargeted attacks that enable 8x more
precise inference on all other users' otherwise-private data points.
Our results cast doubts on the relevance of cryptographic privacy guarantees
in multiparty computation protocols for machine learning, if parties can
arbitrarily select their share of training data.
Related papers
- Students Parrot Their Teachers: Membership Inference on Model
Distillation [54.392069096234074]
We study the privacy provided by knowledge distillation to both the teacher and student training sets.
Our attacks are strongest when student and teacher sets are similar, or when the attacker can poison the teacher set.
arXiv Detail & Related papers (2023-03-06T19:16:23Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - Redactor: Targeted Disinformation Generation using Probabilistic
Decision Boundaries [7.303121062667876]
We study the problem of targeted disinformation where the goal is to lower the accuracy of inference attacks on a specific target.
We show that our problem is best solved by finding the closest points to the target in the input space that will be labeled as a different class.
We also propose techniques for making the disinformation realistic.
arXiv Detail & Related papers (2022-02-07T01:43:25Z) - Enhanced Membership Inference Attacks against Machine Learning Models [9.26208227402571]
Membership inference attacks are used to quantify the private information that a model leaks about the individual data points in its training set.
We derive new attack algorithms that can achieve a high AUC score while also highlighting the different factors that affect their performance.
Our algorithms capture a very precise approximation of privacy loss in models, and can be used as a tool to perform an accurate and informed estimation of privacy risk in machine learning models.
arXiv Detail & Related papers (2021-11-18T13:31:22Z) - Privacy-Preserving Federated Learning on Partitioned Attributes [6.661716208346423]
Federated learning empowers collaborative training without exposing local data or models.
We introduce an adversarial learning based procedure which tunes a local model to release privacy-preserving intermediate representations.
To alleviate the accuracy decline, we propose a defense method based on the forward-backward splitting algorithm.
arXiv Detail & Related papers (2021-04-29T14:49:14Z) - Property Inference Attacks on Convolutional Neural Networks: Influence
and Implications of Target Model's Complexity [1.2891210250935143]
Property Inference Attacks aim to infer from a given model properties about the training dataset seemingly unrelated to the model's primary goal.
This paper investigates the influence of the target model's complexity on the accuracy of this type of attack.
Our findings reveal that the risk of a privacy breach is present independently of the target model's complexity.
arXiv Detail & Related papers (2021-04-27T09:19:36Z) - Curse or Redemption? How Data Heterogeneity Affects the Robustness of
Federated Learning [51.15273664903583]
Data heterogeneity has been identified as one of the key features in federated learning but often overlooked in the lens of robustness to adversarial attacks.
This paper focuses on characterizing and understanding its impact on backdooring attacks in federated learning through comprehensive experiments using synthetic and the LEAF benchmarks.
arXiv Detail & Related papers (2021-02-01T06:06:21Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.