Understanding Gradient Clipping in Private SGD: A Geometric Perspective
- URL: http://arxiv.org/abs/2006.15429v2
- Date: Thu, 18 Mar 2021 03:09:28 GMT
- Title: Understanding Gradient Clipping in Private SGD: A Geometric Perspective
- Authors: Xiangyi Chen, Zhiwei Steven Wu, Mingyi Hong
- Abstract summary: Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information.
Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD.
A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
- Score: 68.61254575987013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models are increasingly popular in many machine learning
applications where the training data may contain sensitive information. To
provide formal and rigorous privacy guarantee, many learning systems now
incorporate differential privacy by training their models with (differentially)
private SGD. A key step in each private SGD update is gradient clipping that
shrinks the gradient of an individual example whenever its L2 norm exceeds some
threshold. We first demonstrate how gradient clipping can prevent SGD from
converging to stationary point. We then provide a theoretical analysis that
fully quantifies the clipping bias on convergence with a disparity measure
between the gradient distribution and a geometrically symmetric distribution.
Our empirical evaluation further suggests that the gradient distributions along
the trajectory of private SGD indeed exhibit symmetric structure that favors
convergence. Together, our results provide an explanation why private SGD with
gradient clipping remains effective in practice despite its potential clipping
bias. Finally, we develop a new perturbation-based technique that can provably
correct the clipping bias even for instances with highly asymmetric gradient
distributions.
Related papers
- On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization.
This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD)
We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z) - Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight [15.139854970044075]
We introduce Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC)
This approach replaces traditional clipping with non-monotonous adaptive gradient scaling.
Our theoretical and empirical analyses confirm that DP-PSASC preserves gradient privacy and delivers superior performance across diverse datasets.
arXiv Detail & Related papers (2024-11-05T12:47:30Z) - Clip Body and Tail Separately: High Probability Guarantees for DPSGD with Heavy Tails [20.432871178766927]
Differentially Private Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning.
DPSGD clips the gradients to a norm and then injects a calibrated noise into the training procedure.
We propose a novel approach, Discriminative(DC)-DPSGD, with two key iterations.
arXiv Detail & Related papers (2024-05-27T16:30:11Z) - Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in
Private SGD [56.01810892677744]
We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD.
We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
arXiv Detail & Related papers (2023-08-23T09:20:41Z) - Improving Differentially Private SGD via Randomly Sparsified Gradients [31.295035726077366]
Differentially private gradient observation (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy bound compression.
We propose an and utilize RS to strengthen communication cost and strengthen privacy bound compression.
arXiv Detail & Related papers (2021-12-01T21:43:34Z) - Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for
Private Learning [74.73901662374921]
A differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters.
We propose an algorithm emphGradient Embedding Perturbation (GEP) towards training differentially private deep models with decent accuracy.
arXiv Detail & Related papers (2021-02-25T04:29:58Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.