Implicit Bias in Noisy-SGD: With Applications to Differentially Private
Training
- URL: http://arxiv.org/abs/2402.08344v1
- Date: Tue, 13 Feb 2024 10:19:33 GMT
- Title: Implicit Bias in Noisy-SGD: With Applications to Differentially Private
Training
- Authors: Tom Sander, Maxime Sylvestre, Alain Durmus
- Abstract summary: Training Deep Neural Networks (DNNs) with small batches using Gradient Descent (SGD) yields superior test performance compared to larger batches.
DP-SGD, used to ensure differential privacy (DP) in DNNs' training, adds Gaussian noise to the clipped gradients.
Surprisingly, large-batch training still results in a significant decrease in performance, which poses an important challenge because strong DP guarantees necessitate the use of massive batches.
- Score: 9.618473763561418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training Deep Neural Networks (DNNs) with small batches using Stochastic
Gradient Descent (SGD) yields superior test performance compared to larger
batches. The specific noise structure inherent to SGD is known to be
responsible for this implicit bias. DP-SGD, used to ensure differential privacy
(DP) in DNNs' training, adds Gaussian noise to the clipped gradients.
Surprisingly, large-batch training still results in a significant decrease in
performance, which poses an important challenge because strong DP guarantees
necessitate the use of massive batches. We first show that the phenomenon
extends to Noisy-SGD (DP-SGD without clipping), suggesting that the
stochasticity (and not the clipping) is the cause of this implicit bias, even
with additional isotropic Gaussian noise. We theoretically analyse the
solutions obtained with continuous versions of Noisy-SGD for the Linear Least
Square and Diagonal Linear Network settings, and reveal that the implicit bias
is indeed amplified by the additional noise. Thus, the performance issues of
large-batch DP-SGD training are rooted in the same underlying principles as
SGD, offering hope for potential improvements in large batch training
strategies.
Related papers
- How Private are DP-SGD Implementations? [61.19794019914523]
We show that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
arXiv Detail & Related papers (2024-03-26T13:02:43Z) - Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations [6.220757855114254]
Injecting heavy-tailed noise to the iterates of descent (SGD) has received increasing attention over the past few years.
We show that SGD with heavy-tailed perturbations achieves $(0, tildemathcalO (1/n)$ DP guarantees.
We illustrate that the heavy-tailed noising mechanism achieves similar DP guarantees compared to the Gaussian case, which suggests that it can be a viable alternative to its light-tailed counterparts.
arXiv Detail & Related papers (2024-03-04T13:53:41Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass [22.578388829171157]
DP-Forward perturbs embedding in the forward pass of language models.
It almost hits the non-private baseline and outperforms DP-SGD by up to 7.7pp at a moderate privacy level.
arXiv Detail & Related papers (2023-09-13T06:37:53Z) - Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in
Private SGD [56.01810892677744]
We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD.
We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
arXiv Detail & Related papers (2023-08-23T09:20:41Z) - DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling [23.8561225168394]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning.
A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training.
We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Improving Differentially Private SGD via Randomly Sparsified Gradients [31.295035726077366]
Differentially private gradient observation (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy bound compression.
We propose an and utilize RS to strengthen communication cost and strengthen privacy bound compression.
arXiv Detail & Related papers (2021-12-01T21:43:34Z) - Dynamic Differential-Privacy Preserving SGD [19.273542515320372]
Differentially-Private Gradient Descent (DP-SGD) prevents training-data privacy breaches by adding noise to the clipped gradient during SGD training.
The same clipping operation and additive noise across training steps results in unstable updates and even a ramp-up period.
We propose the dynamic DP-SGD, which has a lower privacy cost than the DP-SGD during updates until they achieve the same target privacy budget.
arXiv Detail & Related papers (2021-10-30T04:45:11Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z) - Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections [73.95786440318369]
We focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of gradient descent (SGD)
We show that this effect induces an asymmetric heavy-tailed noise on gradient updates.
We then formally prove that GNIs induce an implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry.
arXiv Detail & Related papers (2021-02-13T21:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.