Related papers: Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints

Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints

URL: http://arxiv.org/abs/2210.01864v1
Date: Tue, 4 Oct 2022 19:21:00 GMT
Title: Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints
Authors: Virat Shejwalkar, Arun Ganesh, Rajiv Mathews, Om Thakkar, Abhradeep Thakurta
Abstract summary: This work explores various methods that aggregate intermediate checkpoints to improve the utility of DP training. We show that checkpoint aggregations provide significant gains in the prediction accuracy over the existing SOTA for CIFAR10 and StackOverflow datasets. Finally, we show that the sample variance from last few checkpoints provides a good approximation of the variance of the final model of a DP run.
Score: 17.654346227497403
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: All state-of-the-art (SOTA) differentially private machine learning (DP ML) methods are iterative in nature, and their privacy analyses allow publicly releasing the intermediate training checkpoints. However, DP ML benchmarks, and even practical deployments, typically use only the final training checkpoint to make predictions. In this work, for the first time, we comprehensively explore various methods that aggregate intermediate checkpoints to improve the utility of DP training. Empirically, we demonstrate that checkpoint aggregations provide significant gains in the prediction accuracy over the existing SOTA for CIFAR10 and StackOverflow datasets, and that these gains get magnified in settings with periodically varying training data distributions. For instance, we improve SOTA StackOverflow accuracies to 22.7% (+0.43% absolute) for $\epsilon=8.2$, and 23.84% (+0.43%) for $\epsilon=18.9$. Theoretically, we show that uniform tail averaging of checkpoints improves the empirical risk minimization bound compared to the last checkpoint of DP-SGD. Lastly, we initiate an exploration into estimating the uncertainty that DP noise adds in the predictions of DP ML models. We prove that, under standard assumptions on the loss function, the sample variance from last few checkpoints provides a good approximation of the variance of the final model of a DP run. Empirically, we show that the last few checkpoints can provide a reasonable lower bound for the variance of a converged DP model.

Related papers

Better Rates for Private Linear Regression in the Proportional Regime via Aggressive Clipping [19.186034457189162]
A common approach is to set the clipping constant much larger than the expected norm of the per-sample gradients.<n>While simplifying the analysis, this is however in sharp contrast with what empirical evidence suggests to optimize performance.<n>Our work bridges this gap between theory and practice by crucially operating in a regime where clipping happens frequently.
arXiv Detail & Related papers (2025-05-22T07:34:27Z)
Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors [58.661454334877256]
Drug-Target binding Affinity (DTA) prediction is essential for drug discovery. Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal. We propose $k$NN-DTA, a non-representation embedding-based retrieval method adopted on a pre-trained DTA prediction model.
arXiv Detail & Related papers (2024-07-21T15:49:05Z)
Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models. DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage. We develop a novel DP continual pre-training strategy using only 10% of public data. Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z)
Auto DP-SGD: Dual Improvements of Privacy and Accuracy via Automatic Clipping Threshold and Noise Multiplier Estimation [1.7942265700058988]
DP-SGD has emerged as a popular method to protect personally identifiable information in deep learning applications. We propose an Auto DP-SGD that scales the gradients of each training sample without losing gradient information. We demonstrate that Auto DP-SGD outperforms existing SOTA DP-SGD methods in privacy and accuracy on various benchmark datasets.
arXiv Detail & Related papers (2023-12-05T00:09:57Z)
Local and adaptive mirror descents in extensive-form games [37.04094644847904]
We study how to learn $epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback. We consider a fixed sampling approach, where players still update their policies over time, but with observations obtained through a given fixed sampling policy. We show that this approach guarantees a convergence rate of $tildemathcalO(T-1/2)$ with high probability and has a near-optimal dependence on the game parameters.
arXiv Detail & Related papers (2023-09-01T09:20:49Z)
Differentially Private Image Classification from Features [53.75086935617644]
Leveraging transfer learning has been shown to be an effective strategy for training large models with Differential Privacy. Recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP.
arXiv Detail & Related papers (2022-11-24T04:04:20Z)
Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies [21.739165607822184]
Differentially private (DP) mechanisms protect individual-level information by randomness into the statistical analysis procedure. We examine a DP bootstrap procedure that releases multiple private bootstrap estimates to infer the sampling distribution and construct confidence intervals (CIs) We derive CIs for tasks such as population mean estimation, logistic regression, and quantile regression, and we compare them to existing methods using simulations and real-world experiments on 2016 Canada Census data.
arXiv Detail & Related papers (2022-10-12T12:48:25Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
On the Practicality of Differential Privacy in Federated Learning by Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively. Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks. Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z)
Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization. We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.