Epsilon*: Privacy Metric for Machine Learning Models
- URL: http://arxiv.org/abs/2307.11280v3
- Date: Fri, 9 Feb 2024 23:32:58 GMT
- Title: Epsilon*: Privacy Metric for Machine Learning Models
- Authors: Diana M. Negoescu, Humberto Gonzalez, Saad Eddin Al Orjany, Jilei
Yang, Yuliia Lut, Rahul Tandra, Xiaowen Zhang, Xinyi Zheng, Zach Douglas,
Vidita Nolkha, Parvez Ahammad, Gennady Samorodnitsky
- Abstract summary: Epsilon* is a new metric for measuring the privacy risk of a single model instance prior to, during, or after deployment of privacy mitigation strategies.
It requires only black-box access to model predictions, does not require training data re-sampling or model re-training, and can be used to measure the privacy risk of models not trained with differential privacy.
- Score: 7.461284823977013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Epsilon*, a new privacy metric for measuring the privacy risk of
a single model instance prior to, during, or after deployment of privacy
mitigation strategies. The metric requires only black-box access to model
predictions, does not require training data re-sampling or model re-training,
and can be used to measure the privacy risk of models not trained with
differential privacy. Epsilon* is a function of true positive and false
positive rates in a hypothesis test used by an adversary in a membership
inference attack. We distinguish between quantifying the privacy loss of a
trained model instance, which we refer to as empirical privacy, and quantifying
the privacy loss of the training mechanism which produces this model instance.
Existing approaches in the privacy auditing literature provide lower bounds for
the latter, while our metric provides an empirical lower bound for the former
by relying on an (${\epsilon}$, ${\delta}$)-type of quantification of the
privacy of the trained model instance. We establish a relationship between
these lower bounds and show how to implement Epsilon* to avoid numerical and
noise amplification instability. We further show in experiments on benchmark
public data sets that Epsilon* is sensitive to privacy risk mitigation by
training with differential privacy (DP), where the value of Epsilon* is reduced
by up to 800% compared to the Epsilon* values of non-DP trained baseline
models. This metric allows privacy auditors to be independent of model owners,
and enables visualizing the privacy-utility landscape to make informed
decisions regarding the trade-offs between model privacy and utility.
Related papers
- Initialization Matters: Privacy-Utility Analysis of Overparameterized
Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets.
We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z) - Conditional Density Estimations from Privacy-Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets.
We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - Bayesian Estimation of Differential Privacy [0.0]
Differentially Private SGD enable training machine learning models with formal privacy guarantees.
There is a discrepancy between the protection that such algorithms guarantee in theory and the protection they afford in practice.
This paper empirically estimates the protection afforded by differentially private training as a confidence interval for the privacy budget.
arXiv Detail & Related papers (2022-06-10T15:57:18Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Quantifying identifiability to choose and audit $\epsilon$ in
differentially private deep learning [15.294433619347082]
To use differential privacy in machine learning, data scientists must choose privacy parameters $(epsilon,delta)$.
We transform $(epsilon,delta)$ to a bound on the Bayesian posterior belief of the adversary assumed by differential privacy concerning the presence of any record in the training dataset.
We formulate an implementation of this differential privacy adversary that allows data scientists to audit model training and compute empirical identifiability scores and empirical $(epsilon,delta)$.
arXiv Detail & Related papers (2021-03-04T09:35:58Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.