Related papers: Arbitrary Decisions are a Hidden Cost of Differentially Private Training

Arbitrary Decisions are a Hidden Cost of Differentially Private Training

URL: http://arxiv.org/abs/2302.14517v2
Date: Mon, 15 May 2023 15:07:24 GMT
Title: Arbitrary Decisions are a Hidden Cost of Differentially Private Training
Authors: Bogdan Kulynych, Hsiang Hsu, Carmela Troncoso, Flavio P. Calmon
Abstract summary: Mechanisms used in machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data. For a given input example, the output predicted by equally-private models depends on the randomness used in training.
Score: 7.560688419767116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mechanisms used in privacy-preserving machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data (e.g., adding Gaussian noise to clipped gradients). We demonstrate that such randomization incurs predictive multiplicity: for a given input example, the output predicted by equally-private models depends on the randomness used in training. Thus, for a given input, the predicted output can vary drastically if a model is re-trained, even if the same training dataset is used. The predictive-multiplicity cost of DP training has not been studied, and is currently neither audited for nor communicated to model designers and stakeholders. We derive a bound on the number of re-trainings required to estimate predictive multiplicity reliably. We analyze--both theoretically and through extensive experiments--the predictive-multiplicity cost of three DP-ensuring algorithms: output perturbation, objective perturbation, and DP-SGD. We demonstrate that the degree of predictive multiplicity rises as the level of privacy increases, and is unevenly distributed across individuals and demographic groups in the data. Because randomness used to ensure DP during training explains predictions for some examples, our results highlight a fundamental challenge to the justifiability of decisions supported by differentially private models in high-stakes settings. We conclude that practitioners should audit the predictive multiplicity of their DP-ensuring algorithms before deploying them in applications of individual-level consequence.

Related papers

High-Dimensional Differentially Private Quantile Regression: Distributed Estimation and Statistical Inference [0.26784722398800515]
We propose a differentially private quantile regression method for high-dimensional data in a distributed setting.<n>We develop a differentially private estimation algorithm with iterative updates, ensuring near-optimal statistical accuracy and formal privacy guarantees.
arXiv Detail & Related papers (2025-08-07T09:47:44Z)
RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models [26.66607257183987]
We present RAPID: Retrieval Augmented PrIvate Diffusion model. It is a novel approach that integrates retrieval augmented generation into DPDM training. It significantly outperforms state-of-the-art approaches by large margins in generative quality, memory footprint, and inference cost.
arXiv Detail & Related papers (2025-02-18T11:56:51Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
LLM-based Privacy Data Augmentation Guided by Knowledge Distillation with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost. We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z)
Training Implicit Generative Models via an Invariant Statistical Loss [3.139474253994318]
Implicit generative models have the capability to learn arbitrary complex data distributions. On the downside, training requires telling apart real data from artificially-generated ones using adversarial discriminators. We develop a discriminator-free method for training one-dimensional (1D) generative implicit models.
arXiv Detail & Related papers (2024-02-26T09:32:28Z)
Quantification of Predictive Uncertainty via Inference-Time Sampling [57.749601811982096]
We propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity. The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions.
arXiv Detail & Related papers (2023-08-03T12:43:21Z)
Differentially Private Statistical Inference through $\beta$-Divergence One Posterior Sampling [2.8544822698499255]
We propose a posterior sampling scheme from a generalised posterior targeting the minimisation of the $beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model. We show that $beta$D-Bayes produces more precise inference estimation for the same privacy guarantees.
arXiv Detail & Related papers (2023-07-11T12:00:15Z)
Training Private Models That Know What They Don't Know [40.19666295972155]
We find that several popular selective prediction approaches are ineffective in a differentially private setting. We propose a novel evaluation mechanism which isolate selective prediction performance across model utility levels.
arXiv Detail & Related papers (2023-05-28T12:20:07Z)
Large Scale Transfer Learning for Differentially Private Image Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z)
Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. In such settings, there is a distinct type of distribution shift between the training and test data. We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z)
Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text. We show that this performance drop can be mitigated with the use of large pretrained models. We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z)
DPlis: Boosting Utility of Differentially Private Deep Learning via Randomized Smoothing [0.0]
We propose DPlis--Differentially Private Learning wIth Smoothing. We show that DPlis can effectively boost model quality and training stability under a given privacy budget.
arXiv Detail & Related papers (2021-03-02T06:33:14Z)
Private Prediction Sets [72.75711776601973]
Machine learning systems need reliable uncertainty quantification and protection of individuals' privacy. We present a framework that treats these two desiderata jointly. We evaluate the method on large-scale computer vision datasets.
arXiv Detail & Related papers (2021-02-11T18:59:11Z)
MACE: A Flexible Framework for Membership Privacy Estimation in Generative Models [14.290199072565162]
We propose the first formal framework for membership privacy estimation in generative models. Compared to previous works, our framework makes more realistic and flexible assumptions.
arXiv Detail & Related papers (2020-09-11T23:15:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.