Related papers: Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training

Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training

URL: http://arxiv.org/abs/2402.08344v1
Date: Tue, 13 Feb 2024 10:19:33 GMT
Title: Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training
Authors: Tom Sander, Maxime Sylvestre, Alain Durmus
Abstract summary: Training Deep Neural Networks (DNNs) with small batches using Gradient Descent (SGD) yields superior test performance compared to larger batches. DP-SGD, used to ensure differential privacy (DP) in DNNs' training, adds Gaussian noise to the clipped gradients. Surprisingly, large-batch training still results in a significant decrease in performance, which poses an important challenge because strong DP guarantees necessitate the use of massive batches.
Score: 9.618473763561418
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training Deep Neural Networks (DNNs) with small batches using Stochastic Gradient Descent (SGD) yields superior test performance compared to larger batches. The specific noise structure inherent to SGD is known to be responsible for this implicit bias. DP-SGD, used to ensure differential privacy (DP) in DNNs' training, adds Gaussian noise to the clipped gradients. Surprisingly, large-batch training still results in a significant decrease in performance, which poses an important challenge because strong DP guarantees necessitate the use of massive batches. We first show that the phenomenon extends to Noisy-SGD (DP-SGD without clipping), suggesting that the stochasticity (and not the clipping) is the cause of this implicit bias, even with additional isotropic Gaussian noise. We theoretically analyse the solutions obtained with continuous versions of Noisy-SGD for the Linear Least Square and Diagonal Linear Network settings, and reveal that the implicit bias is indeed amplified by the additional noise. Thus, the performance issues of large-batch DP-SGD training are rooted in the same underlying principles as SGD, offering hope for potential improvements in large batch training strategies.

Related papers

Technical Report: Full Version of Analyzing and Optimizing Perturbation of DP-SGD Geometrically [7.905629859216635]
We first generalize DP-SGD and theoretically derive the impact of DP noise on the training process. Our analysis reveals that, in terms of a perturbed gradient, only the noise on direction has eminent impact on the model efficiency. We design a geometric strategy GeoDP within the DP framework, which perturbs the direction and the magnitude of a gradient.
arXiv Detail & Related papers (2025-04-08T02:26:10Z)
How Private are DP-SGD Implementations? [61.19794019914523]
We show that there can be a substantial gap between the privacy analysis when using the two types of batch sampling. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
arXiv Detail & Related papers (2024-03-26T13:02:43Z)
Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations [6.220757855114254]
Injecting heavy-tailed noise to the iterates of descent (SGD) has received increasing attention over the past few years. We show that SGD with heavy-tailed perturbations achieves $(0, tildemathcalO (1/n)$ DP guarantees. We illustrate that the heavy-tailed noising mechanism achieves similar DP guarantees compared to the Gaussian case, which suggests that it can be a viable alternative to its light-tailed counterparts.
arXiv Detail & Related papers (2024-03-04T13:53:41Z)
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation. We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z)
DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass [22.578388829171157]
DP-Forward perturbs embedding in the forward pass of language models. It almost hits the non-private baseline and outperforms DP-SGD by up to 7.7pp at a moderate privacy level.
arXiv Detail & Related papers (2023-09-13T06:37:53Z)
Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in Private SGD [56.01810892677744]
We show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD. We propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias.
arXiv Detail & Related papers (2023-08-23T09:20:41Z)
DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling [23.8561225168394]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning. A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training. We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z)
Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data. We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z)
Improving Differentially Private SGD via Randomly Sparsified Gradients [31.295035726077366]
Differentially private gradient observation (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy bound compression. We propose an and utilize RS to strengthen communication cost and strengthen privacy bound compression.
arXiv Detail & Related papers (2021-12-01T21:43:34Z)
Dynamic Differential-Privacy Preserving SGD [19.273542515320372]
Differentially-Private Gradient Descent (DP-SGD) prevents training-data privacy breaches by adding noise to the clipped gradient during SGD training. The same clipping operation and additive noise across training steps results in unstable updates and even a ramp-up period. We propose the dynamic DP-SGD, which has a lower privacy cost than the DP-SGD during updates until they achieve the same target privacy budget.
arXiv Detail & Related papers (2021-10-30T04:45:11Z)
Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis. In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis. We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z)
Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections [73.95786440318369]
We focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of gradient descent (SGD) We show that this effect induces an asymmetric heavy-tailed noise on gradient updates. We then formally prove that GNIs induce an implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry.
arXiv Detail & Related papers (2021-02-13T21:28:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.