Understanding Gradient Clipping in Private SGD: A Geometric Perspective
- URL: http://arxiv.org/abs/2006.15429v2
- Date: Thu, 18 Mar 2021 03:09:28 GMT
- Title: Understanding Gradient Clipping in Private SGD: A Geometric Perspective
- Authors: Xiangyi Chen, Zhiwei Steven Wu, Mingyi Hong
- Abstract summary: Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information.
Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD.
A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
- Score: 68.61254575987013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models are increasingly popular in many machine learning
applications where the training data may contain sensitive information. To
provide formal and rigorous privacy guarantee, many learning systems now
incorporate differential privacy by training their models with (differentially)
private SGD. A key step in each private SGD update is gradient clipping that
shrinks the gradient of an individual example whenever its L2 norm exceeds some
threshold. We first demonstrate how gradient clipping can prevent SGD from
converging to stationary point. We then provide a theoretical analysis that
fully quantifies the clipping bias on convergence with a disparity measure
between the gradient distribution and a geometrically symmetric distribution.
Our empirical evaluation further suggests that the gradient distributions along
the trajectory of private SGD indeed exhibit symmetric structure that favors
convergence. Together, our results provide an explanation why private SGD with
gradient clipping remains effective in practice despite its potential clipping
bias. Finally, we develop a new perturbation-based technique that can provably
correct the clipping bias even for instances with highly asymmetric gradient
distributions.
Related papers
- Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight [15.139854970044075]
We introduce Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC)
This approach replaces traditional clipping with non-monotonous adaptive gradient scaling.
Our theoretical and empirical analyses confirm that DP-PSASC preserves gradient privacy and delivers superior performance across diverse datasets.
arXiv Detail & Related papers (2024-11-05T12:47:30Z) - Clip Body and Tail Separately: High Probability Guarantees for DPSGD with Heavy Tails [20.432871178766927]
Differentially Private Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning.
DPSGD clips the gradients to a norm and then injects a calibrated noise into the training procedure.
We propose a novel approach, Discriminative(DC)-DPSGD, with two key iterations.
arXiv Detail & Related papers (2024-05-27T16:30:11Z) - Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in
Private SGD [56.01810892677744]
We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD.
We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
arXiv Detail & Related papers (2023-08-23T09:20:41Z) - Improving Differentially Private SGD via Randomly Sparsified Gradients [31.295035726077366]
Differentially private gradient observation (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy bound compression.
We propose an and utilize RS to strengthen communication cost and strengthen privacy bound compression.
arXiv Detail & Related papers (2021-12-01T21:43:34Z) - Large Scale Private Learning via Low-rank Reparametrization [77.38947817228656]
We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks.
We are the first able to apply differential privacy on the BERT model and achieve an average accuracy of $83.9%$ on four downstream tasks.
arXiv Detail & Related papers (2021-06-17T10:14:43Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Bypassing the Ambient Dimension: Private SGD with Gradient Subspace
Identification [47.23063195722975]
Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM)
Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model.
We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace.
arXiv Detail & Related papers (2020-07-07T22:31:01Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.