Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations
- URL: http://arxiv.org/abs/2403.02051v1
- Date: Mon, 4 Mar 2024 13:53:41 GMT
- Title: Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations
- Authors: Umut \c{S}im\c{s}ekli, Mert G\"urb\"uzbalaban, Sinan
Y{\i}ld{\i}r{\i}m, Lingjiong Zhu
- Abstract summary: Injecting heavy-tailed noise to the iterates of descent (SGD) has received increasing attention over the past few years.
We show that SGD with heavy-tailed perturbations achieves $(0, tildemathcalO (1/n)$ DP guarantees.
We illustrate that the heavy-tailed noising mechanism achieves similar DP guarantees compared to the Gaussian case, which suggests that it can be a viable alternative to its light-tailed counterparts.
- Score: 6.220757855114254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Injecting heavy-tailed noise to the iterates of stochastic gradient descent
(SGD) has received increasing attention over the past few years. While various
theoretical properties of the resulting algorithm have been analyzed mainly
from learning theory and optimization perspectives, their privacy preservation
properties have not yet been established. Aiming to bridge this gap, we provide
differential privacy (DP) guarantees for noisy SGD, when the injected noise
follows an $\alpha$-stable distribution, which includes a spectrum of
heavy-tailed distributions (with infinite variance) as well as the Gaussian
distribution. Considering the $(\epsilon, \delta)$-DP framework, we show that
SGD with heavy-tailed perturbations achieves $(0, \tilde{\mathcal{O}}(1/n))$-DP
for a broad class of loss functions which can be non-convex, where $n$ is the
number of data points. As a remarkable byproduct, contrary to prior work that
necessitates bounded sensitivity for the gradients or clipping the iterates,
our theory reveals that under mild assumptions, such a projection step is not
actually necessary. We illustrate that the heavy-tailed noising mechanism
achieves similar DP guarantees compared to the Gaussian case, which suggests
that it can be a viable alternative to its light-tailed counterparts.
Related papers
- Noise is All You Need: Private Second-Order Convergence of Noisy SGD [15.31952197599396]
We show that noise necessary for privacy already implies second-order convergence under the standard smoothness assumptions.
We get second-order convergence essentially for free: DP-SGD, the workhorse of modern private optimization, under minimal assumptions can be used to find a second-order stationary point.
arXiv Detail & Related papers (2024-10-09T13:43:17Z) - Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution [6.144680854063938]
We consider a variant of the gradient descent (SGD) with a random learning rate to reveal its convergence properties.
We demonstrate that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under weak assumptions.
arXiv Detail & Related papers (2024-06-23T06:52:33Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping [40.69950711262191]
We consider differentially private convex optimization for heavy-tailed data with the guarantee of being differentially private (DP)
We establish new convergence results and improved complexity bounds for the proposed algorithm called AClipped-dpSGD for constrained and unconstrained convex problems.
arXiv Detail & Related papers (2022-06-27T01:39:15Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Non Asymptotic Bounds for Optimization via Online Multiplicative
Stochastic Gradient Descent [0.0]
The gradient noise of Gradient Descent (SGD) is considered to play a key role in its properties.
We show that noise classes that have the same mean and covariance structure of SGD via minibatching have similar properties.
We also establish bounds for the convergence of the M-SGD algorithm in the strongly convex regime.
arXiv Detail & Related papers (2021-12-14T02:25:43Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z) - Non-Convex SGD Learns Halfspaces with Adversarial Label Noise [50.659479930171585]
We show the solution to the problem of learning surrogate learning homogeneous halfspaces in the distribution-specific model.
In any convex distributions, we show that the misclassification error inherently leads to misclassification error of halfspace.
arXiv Detail & Related papers (2020-06-11T18:55:59Z) - The Heavy-Tail Phenomenon in SGD [7.366405857677226]
We show that depending on the structure of the Hessian of the loss at the minimum, the SGD iterates will converge to a emphheavy-tailed stationary distribution.
We translate our results into insights about the behavior of SGD in deep learning.
arXiv Detail & Related papers (2020-06-08T16:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.