Batch Clipping and Adaptive Layerwise Clipping for Differential Private
Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2307.11939v1
- Date: Fri, 21 Jul 2023 23:37:37 GMT
- Title: Batch Clipping and Adaptive Layerwise Clipping for Differential Private
Stochastic Gradient Descent
- Authors: Toan N. Nguyen, Phuong Ha Nguyen, Lam M. Nguyen, Marten Van Dijk
- Abstract summary: Differential Private Gradient Descent (DPSGD) transmits a sum of clipped gradients obfuscated with Gaussian noise to a central server.
Batch Clipping (BC) where, instead of clipping single gradients, we average and clip batches of gradients.
Adaptive Layerwise Clipping methods (ALC) where each layer has its own adaptively finetuned clipping constant.
- Score: 21.55827140532476
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Each round in Differential Private Stochastic Gradient Descent (DPSGD)
transmits a sum of clipped gradients obfuscated with Gaussian noise to a
central server which uses this to update a global model which often represents
a deep neural network. Since the clipped gradients are computed separately,
which we call Individual Clipping (IC), deep neural networks like resnet-18
cannot use Batch Normalization Layers (BNL) which is a crucial component in
deep neural networks for achieving a high accuracy. To utilize BNL, we
introduce Batch Clipping (BC) where, instead of clipping single gradients as in
the orginal DPSGD, we average and clip batches of gradients. Moreover, the
model entries of different layers have different sensitivities to the added
Gaussian noise. Therefore, Adaptive Layerwise Clipping methods (ALC), where
each layer has its own adaptively finetuned clipping constant, have been
introduced and studied, but so far without rigorous DP proofs. In this paper,
we propose {\em a new ALC and provide rigorous DP proofs for both BC and ALC}.
Experiments show that our modified DPSGD with BC and ALC for CIFAR-$10$ with
resnet-$18$ converges while DPSGD with IC and ALC does not.
Related papers
- On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization.
This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD)
We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z) - Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise.
Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise.
We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Generalizing DP-SGD with Shuffling and Batch Clipping [21.55827140532476]
DP-SGD implements individual clipping with random subsampling, which forces a mini-batch SGD approach.
We provide a general differential private algorithmic framework that goes beyond DP-SGD and allows any possible first order summings.
We show a $sqrtg E$ DP dependency for batch clipping with shuffling.
arXiv Detail & Related papers (2022-12-12T09:43:26Z) - Differentially Private Learning with Per-Sample Adaptive Clipping [8.401653565794353]
We propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function.
We show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.
arXiv Detail & Related papers (2022-12-01T07:26:49Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Automatic Clipping: Differentially Private Deep Learning Made Easier and
Stronger [39.93710312222771]
Per-example clipping is a key algorithmic step that enables practical differential private (DP) training for deep learning models.
We propose an easy-to-use replacement, called automatic clipping, that eliminates the need to tune R for any DPs.
arXiv Detail & Related papers (2022-06-14T19:49:44Z) - Improving Differentially Private SGD via Randomly Sparsified Gradients [31.295035726077366]
Differentially private gradient observation (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy bound compression.
We propose an and utilize RS to strengthen communication cost and strengthen privacy bound compression.
arXiv Detail & Related papers (2021-12-01T21:43:34Z) - Differentially private training of neural networks with Langevin
dynamics forcalibrated predictive uncertainty [58.730520380312676]
We show that differentially private gradient descent (DP-SGD) can yield poorly calibrated, overconfident deep learning models.
This represents a serious issue for safety-critical applications, e.g. in medical diagnosis.
arXiv Detail & Related papers (2021-07-09T08:14:45Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information.
Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD.
A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.