Arbitrary Decisions are a Hidden Cost of Differentially Private Training
- URL: http://arxiv.org/abs/2302.14517v2
- Date: Mon, 15 May 2023 15:07:24 GMT
- Title: Arbitrary Decisions are a Hidden Cost of Differentially Private Training
- Authors: Bogdan Kulynych, Hsiang Hsu, Carmela Troncoso, Flavio P. Calmon
- Abstract summary: Mechanisms used in machine learning often aim to guarantee differential privacy (DP) during model training.
Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data.
For a given input example, the output predicted by equally-private models depends on the randomness used in training.
- Score: 7.560688419767116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mechanisms used in privacy-preserving machine learning often aim to guarantee
differential privacy (DP) during model training. Practical DP-ensuring training
methods use randomization when fitting model parameters to privacy-sensitive
data (e.g., adding Gaussian noise to clipped gradients). We demonstrate that
such randomization incurs predictive multiplicity: for a given input example,
the output predicted by equally-private models depends on the randomness used
in training. Thus, for a given input, the predicted output can vary drastically
if a model is re-trained, even if the same training dataset is used. The
predictive-multiplicity cost of DP training has not been studied, and is
currently neither audited for nor communicated to model designers and
stakeholders. We derive a bound on the number of re-trainings required to
estimate predictive multiplicity reliably. We analyze--both theoretically and
through extensive experiments--the predictive-multiplicity cost of three
DP-ensuring algorithms: output perturbation, objective perturbation, and
DP-SGD. We demonstrate that the degree of predictive multiplicity rises as the
level of privacy increases, and is unevenly distributed across individuals and
demographic groups in the data. Because randomness used to ensure DP during
training explains predictions for some examples, our results highlight a
fundamental challenge to the justifiability of decisions supported by
differentially private models in high-stakes settings. We conclude that
practitioners should audit the predictive multiplicity of their DP-ensuring
algorithms before deploying them in applications of individual-level
consequence.
Related papers
- LLM-based Privacy Data Augmentation Guided by Knowledge Distillation
with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost.
We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z) - Training Implicit Generative Models via an Invariant Statistical Loss [3.139474253994318]
Implicit generative models have the capability to learn arbitrary complex data distributions.
On the downside, training requires telling apart real data from artificially-generated ones using adversarial discriminators.
We develop a discriminator-free method for training one-dimensional (1D) generative implicit models.
arXiv Detail & Related papers (2024-02-26T09:32:28Z) - Quantification of Predictive Uncertainty via Inference-Time Sampling [57.749601811982096]
We propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity.
The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions.
arXiv Detail & Related papers (2023-08-03T12:43:21Z) - Differentially Private Statistical Inference through $\beta$-Divergence
One Posterior Sampling [2.8544822698499255]
We propose a posterior sampling scheme from a generalised posterior targeting the minimisation of the $beta$-divergence between the model and the data generating process.
This provides private estimation that is generally applicable without requiring changes to the underlying model.
We show that $beta$D-Bayes produces more precise inference estimation for the same privacy guarantees.
arXiv Detail & Related papers (2023-07-11T12:00:15Z) - Training Private Models That Know What They Don't Know [40.19666295972155]
We find that several popular selective prediction approaches are ineffective in a differentially private setting.
We propose a novel evaluation mechanism which isolate selective prediction performance across model utility levels.
arXiv Detail & Related papers (2023-05-28T12:20:07Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - DPlis: Boosting Utility of Differentially Private Deep Learning via
Randomized Smoothing [0.0]
We propose DPlis--Differentially Private Learning wIth Smoothing.
We show that DPlis can effectively boost model quality and training stability under a given privacy budget.
arXiv Detail & Related papers (2021-03-02T06:33:14Z) - Private Prediction Sets [72.75711776601973]
Machine learning systems need reliable uncertainty quantification and protection of individuals' privacy.
We present a framework that treats these two desiderata jointly.
We evaluate the method on large-scale computer vision datasets.
arXiv Detail & Related papers (2021-02-11T18:59:11Z) - MACE: A Flexible Framework for Membership Privacy Estimation in
Generative Models [14.290199072565162]
We propose the first formal framework for membership privacy estimation in generative models.
Compared to previous works, our framework makes more realistic and flexible assumptions.
arXiv Detail & Related papers (2020-09-11T23:15:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.