Recycling Scraps: Improving Private Learning by Leveraging Intermediate
Checkpoints
- URL: http://arxiv.org/abs/2210.01864v1
- Date: Tue, 4 Oct 2022 19:21:00 GMT
- Title: Recycling Scraps: Improving Private Learning by Leveraging Intermediate
Checkpoints
- Authors: Virat Shejwalkar, Arun Ganesh, Rajiv Mathews, Om Thakkar, Abhradeep
Thakurta
- Abstract summary: This work explores various methods that aggregate intermediate checkpoints to improve the utility of DP training.
We show that checkpoint aggregations provide significant gains in the prediction accuracy over the existing SOTA for CIFAR10 and StackOverflow datasets.
Finally, we show that the sample variance from last few checkpoints provides a good approximation of the variance of the final model of a DP run.
- Score: 17.654346227497403
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: All state-of-the-art (SOTA) differentially private machine learning (DP ML)
methods are iterative in nature, and their privacy analyses allow publicly
releasing the intermediate training checkpoints. However, DP ML benchmarks, and
even practical deployments, typically use only the final training checkpoint to
make predictions. In this work, for the first time, we comprehensively explore
various methods that aggregate intermediate checkpoints to improve the utility
of DP training. Empirically, we demonstrate that checkpoint aggregations
provide significant gains in the prediction accuracy over the existing SOTA for
CIFAR10 and StackOverflow datasets, and that these gains get magnified in
settings with periodically varying training data distributions. For instance,
we improve SOTA StackOverflow accuracies to 22.7% (+0.43% absolute) for
$\epsilon=8.2$, and 23.84% (+0.43%) for $\epsilon=18.9$. Theoretically, we show
that uniform tail averaging of checkpoints improves the empirical risk
minimization bound compared to the last checkpoint of DP-SGD. Lastly, we
initiate an exploration into estimating the uncertainty that DP noise adds in
the predictions of DP ML models. We prove that, under standard assumptions on
the loss function, the sample variance from last few checkpoints provides a
good approximation of the variance of the final model of a DP run. Empirically,
we show that the last few checkpoints can provide a reasonable lower bound for
the variance of a converged DP model.
Related papers
- Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors [58.661454334877256]
Drug-Target binding Affinity (DTA) prediction is essential for drug discovery.
Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal.
We propose $k$NN-DTA, a non-representation embedding-based retrieval method adopted on a pre-trained DTA prediction model.
arXiv Detail & Related papers (2024-07-21T15:49:05Z) - Black Box Differential Privacy Auditing Using Total Variation Distance [3.830092569453011]
We present a practical method to audit the differential privacy guarantees of a machine learning model using a small hold-out dataset.
Our method estimates the total variation (TV) distance between scores obtained with a subset of the training data and the hold-out dataset.
arXiv Detail & Related papers (2024-06-07T10:52:15Z) - ApproBiVT: Lead ASR Models to Generalize Better Using Approximated
Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging [7.0626076422397475]
We take the training loss and validation loss as proxies of bias and variance and guide the early stopping and checkpoint averaging.
When evaluating with advanced ASR models, our recipe provides 2.5%-3.7% and 3.1%-4.6% CER reduction.
arXiv Detail & Related papers (2023-08-05T12:50:54Z) - AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation [64.9230895853942]
Domain generalization can be arbitrarily hard without exploiting target domain information.
Test-time adaptive (TTA) methods are proposed to address this issue.
In this work, we adopt Non-Parametric to perform the test-time Adaptation (AdaNPC)
arXiv Detail & Related papers (2023-04-25T04:23:13Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Differentially Private Bootstrap: New Privacy Analysis and Inference
Strategies [28.95350475681164]
Differentially private (DP) mechanisms protect individual-level information by introducing randomness into the statistical analysis procedure.
We examine a DP bootstrap procedure that releases multiple private bootstrap estimates to infer the sampling distribution and construct confidence intervals (CIs)
Our privacy analysis presents new results on the privacy cost of a single DP bootstrap estimate, applicable to any DP mechanisms, and identifies some misapplications of the bootstrap in the existing literature.
arXiv Detail & Related papers (2022-10-12T12:48:25Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.