Adversary Instantiation: Lower Bounds for Differentially Private Machine
Learning
- URL: http://arxiv.org/abs/2101.04535v1
- Date: Mon, 11 Jan 2021 18:47:11 GMT
- Title: Adversary Instantiation: Lower Bounds for Differentially Private Machine
Learning
- Authors: Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot and
Nicholas Carlini
- Abstract summary: Differentially private (DP) machine learning allows us to train models on private data while limiting data leakage.
In this paper, we evaluate the importance of the adversary capabilities allowed in the privacy analysis of DP training algorithms.
- Score: 43.6041475698327
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentially private (DP) machine learning allows us to train models on
private data while limiting data leakage. DP formalizes this data leakage
through a cryptographic game, where an adversary must predict if a model was
trained on a dataset D, or a dataset D' that differs in just one example.If
observing the training algorithm does not meaningfully increase the adversary's
odds of successfully guessing which dataset the model was trained on, then the
algorithm is said to be differentially private. Hence, the purpose of privacy
analysis is to upper bound the probability that any adversary could
successfully guess which dataset the model was trained on.In our paper, we
instantiate this hypothetical adversary in order to establish lower bounds on
the probability that this distinguishing game can be won. We use this adversary
to evaluate the importance of the adversary capabilities allowed in the privacy
analysis of DP training algorithms.For DP-SGD, the most common method for
training neural networks with differential privacy, our lower bounds are tight
and match the theoretical upper bound. This implies that in order to prove
better upper bounds, it will be necessary to make use of additional
assumptions. Fortunately, we find that our attacks are significantly weaker
when additional (realistic)restrictions are put in place on the adversary's
capabilities.Thus, in the practical setting common to many real-world
deployments, there is a gap between our lower bounds and the upper bounds
provided by the analysis: differential privacy is conservative and adversaries
may not be able to leak as much information as suggested by the theoretical
bound.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Closed-Form Bounds for DP-SGD against Record-level Inference [18.85865832127335]
We focus on the popular DP-SGD algorithm, and derive simple closed-form bounds.
We obtain bounds for membership inference that match state-of-the-art techniques.
We present a novel data-dependent bound against attribute inference.
arXiv Detail & Related papers (2024-02-22T09:26:16Z) - Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD [44.11069254181353]
We show that DP-SGD leaks significantly less privacy for many datapoints when trained on common benchmarks.
This implies privacy attacks will necessarily fail against many datapoints if the adversary does not have sufficient control over the possible training datasets.
arXiv Detail & Related papers (2023-07-01T11:51:56Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - Fine-Tuning with Differential Privacy Necessitates an Additional
Hyperparameter Search [38.83524780461911]
We show how carefully selecting the layers being fine-tuned in the pretrained neural network allows us to establish new state-of-the-art tradeoffs between privacy and accuracy.
We achieve 77.9% accuracy for $(varepsilon, delta)= (2, 10-5)$ on CIFAR-100 for a model pretrained on ImageNet.
arXiv Detail & Related papers (2022-10-05T11:32:49Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Quantifying identifiability to choose and audit $\epsilon$ in
differentially private deep learning [15.294433619347082]
To use differential privacy in machine learning, data scientists must choose privacy parameters $(epsilon,delta)$.
We transform $(epsilon,delta)$ to a bound on the Bayesian posterior belief of the adversary assumed by differential privacy concerning the presence of any record in the training dataset.
We formulate an implementation of this differential privacy adversary that allows data scientists to audit model training and compute empirical identifiability scores and empirical $(epsilon,delta)$.
arXiv Detail & Related papers (2021-03-04T09:35:58Z) - User-Level Privacy-Preserving Federated Learning: Analysis and
Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models.
From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs.
We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.