Calibrating Practical Privacy Risks for Differentially Private Machine Learning
- URL: http://arxiv.org/abs/2410.22673v1
- Date: Wed, 30 Oct 2024 03:52:01 GMT
- Title: Calibrating Practical Privacy Risks for Differentially Private Machine Learning
- Authors: Yuechun Gu, Keke Chen,
- Abstract summary: We study the approaches that can lower the attacking success rate to allow for more flexible privacy budget settings in model training.
We find that by selectively suppressing privacy-sensitive features, we can achieve lower ASR values without compromising application-specific data utility.
- Score: 5.363664265121231
- License:
- Abstract: Differential privacy quantifies privacy through the privacy budget $\epsilon$, yet its practical interpretation is complicated by variations across models and datasets. Recent research on differentially private machine learning and membership inference has highlighted that with the same theoretical $\epsilon$ setting, the likelihood-ratio-based membership inference (LiRA) attacking success rate (ASR) may vary according to specific datasets and models, which might be a better indicator for evaluating real-world privacy risks. Inspired by this practical privacy measure, we study the approaches that can lower the attacking success rate to allow for more flexible privacy budget settings in model training. We find that by selectively suppressing privacy-sensitive features, we can achieve lower ASR values without compromising application-specific data utility. We use the SHAP and LIME model explainer to evaluate feature sensitivities and develop feature-masking strategies. Our findings demonstrate that the LiRA $ASR^M$ on model $M$ can properly indicate the inherent privacy risk of a dataset for modeling, and it's possible to modify datasets to enable the use of larger theoretical $\epsilon$ settings to achieve equivalent practical privacy protection. We have conducted extensive experiments to show the inherent link between ASR and the dataset's privacy risk. By carefully selecting features to mask, we can preserve more data utility with equivalent practical privacy protection and relaxed $\epsilon$ settings. The implementation details are shared online at the provided GitHub URL \url{https://anonymous.4open.science/r/On-sensitive-features-and-empirical-epsilon-lower-bounds-BF67/}.
Related papers
- Share Your Representation Only: Guaranteed Improvement of the
Privacy-Utility Tradeoff in Federated Learning [47.042811490685324]
Mitigating the risk of this information leakage, using state of the art differentially private algorithms, also does not come for free.
In this paper, we consider a representation learning objective that various parties collaboratively refine on a federated model, with differential privacy guarantees.
We observe a significant performance improvement over the prior work under the same small privacy budget.
arXiv Detail & Related papers (2023-09-11T14:46:55Z) - Epsilon*: Privacy Metric for Machine Learning Models [7.461284823977013]
Epsilon* is a new metric for measuring the privacy risk of a single model instance prior to, during, or after deployment of privacy mitigation strategies.
It requires only black-box access to model predictions, does not require training data re-sampling or model re-training, and can be used to measure the privacy risk of models not trained with differential privacy.
arXiv Detail & Related papers (2023-07-21T00:49:07Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Quantifying identifiability to choose and audit $\epsilon$ in
differentially private deep learning [15.294433619347082]
To use differential privacy in machine learning, data scientists must choose privacy parameters $(epsilon,delta)$.
We transform $(epsilon,delta)$ to a bound on the Bayesian posterior belief of the adversary assumed by differential privacy concerning the presence of any record in the training dataset.
We formulate an implementation of this differential privacy adversary that allows data scientists to audit model training and compute empirical identifiability scores and empirical $(epsilon,delta)$.
arXiv Detail & Related papers (2021-03-04T09:35:58Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - On Deep Learning with Label Differential Privacy [54.45348348861426]
We study the multi-class classification setting where the labels are considered sensitive and ought to be protected.
We propose a new algorithm for training deep neural networks with label differential privacy, and run evaluations on several datasets.
arXiv Detail & Related papers (2021-02-11T15:09:06Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.