Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints
- URL: http://arxiv.org/abs/2210.01864v2
- Date: Tue, 17 Sep 2024 05:19:09 GMT
- Title: Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints
- Authors: Virat Shejwalkar, Arun Ganesh, Rajiv Mathews, Yarong Mu, Shuang Song, Om Thakkar, Abhradeep Thakurta, Xinyi Zheng,
- Abstract summary: We design a general framework that uses aggregates of intermediate checkpoints emphduring training to increase the accuracy of DP ML techniques.
We demonstrate that training over aggregates can provide significant gains in prediction accuracy over the existing state-of-the-art for StackOverflow, CIFAR10 and CIFAR100 datasets.
Our methods achieve relative improvements of 0.54% and 62.6% in terms of utility and variance, on a proprietary, production-grade pCVR task.
- Score: 20.533039211835902
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we focus on improving the accuracy-variance trade-off for state-of-the-art differentially private machine learning (DP ML) methods. First, we design a general framework that uses aggregates of intermediate checkpoints \emph{during training} to increase the accuracy of DP ML techniques. Specifically, we demonstrate that training over aggregates can provide significant gains in prediction accuracy over the existing state-of-the-art for StackOverflow, CIFAR10 and CIFAR100 datasets. For instance, we improve the state-of-the-art DP StackOverflow accuracies to 22.74\% (+2.06\% relative) for $\epsilon=8.2$, and 23.90\% (+2.09\%) for $\epsilon=18.9$. Furthermore, these gains magnify in settings with periodically varying training data distributions. We also demonstrate that our methods achieve relative improvements of 0.54\% and 62.6\% in terms of utility and variance, on a proprietary, production-grade pCVR task. Lastly, we initiate an exploration into estimating the uncertainty (variance) that DP noise adds in the predictions of DP ML models. We prove that, under standard assumptions on the loss function, the sample variance from last few checkpoints provides a good approximation of the variance of the final model of a DP run. Empirically, we show that the last few checkpoints can provide a reasonable lower bound for the variance of a converged DP model. Crucially, all the methods proposed in this paper operate on \emph{a single training run} of the DP ML technique, thus incurring no additional privacy cost.
Related papers
- Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - DP-SGD-Global-Adapt-V2-S: Triad Improvements of Privacy, Accuracy and Fairness via Step Decay Noise Multiplier and Step Decay Upper Clipping Threshold [1.8741812670237725]
DPSGD's per-sample gradient clipping and uniform noise addition during training can significantly degrade model utility and fairness.
We develop the DP-SGD-Global-Adapt-V2-S, which has a step-decay noise multiplier and an upper clipping threshold that is also decayed step-wise.
It improves accuracy by 0.9795%, 0.6786%, and 4.0130% in MNIST, CIFAR10, and CIFAR100, respectively.
arXiv Detail & Related papers (2023-12-05T00:09:57Z) - Local and adaptive mirror descents in extensive-form games [37.04094644847904]
We study how to learn $epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.
We consider a fixed sampling approach, where players still update their policies over time, but with observations obtained through a given fixed sampling policy.
We show that this approach guarantees a convergence rate of $tildemathcalO(T-1/2)$ with high probability and has a near-optimal dependence on the game parameters.
arXiv Detail & Related papers (2023-09-01T09:20:49Z) - Differentially Private Image Classification from Features [53.75086935617644]
Leveraging transfer learning has been shown to be an effective strategy for training large models with Differential Privacy.
Recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP.
arXiv Detail & Related papers (2022-11-24T04:04:20Z) - Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies [21.739165607822184]
Differentially private (DP) mechanisms protect individual-level information by randomness into the statistical analysis procedure.
We examine a DP bootstrap procedure that releases multiple private bootstrap estimates to infer the sampling distribution and construct confidence intervals (CIs)
We derive CIs for tasks such as population mean estimation, logistic regression, and quantile regression, and we compare them to existing methods using simulations and real-world experiments on 2016 Canada Census data.
arXiv Detail & Related papers (2022-10-12T12:48:25Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.