Noise is All You Need: Private Second-Order Convergence of Noisy SGD
- URL: http://arxiv.org/abs/2410.06878v1
- Date: Wed, 9 Oct 2024 13:43:17 GMT
- Title: Noise is All You Need: Private Second-Order Convergence of Noisy SGD
- Authors: Dmitrii Avdiukhin, Michael Dinitz, Chenglin Fan, Grigory Yaroslavtsev,
- Abstract summary: We show that noise necessary for privacy already implies second-order convergence under the standard smoothness assumptions.
We get second-order convergence essentially for free: DP-SGD, the workhorse of modern private optimization, under minimal assumptions can be used to find a second-order stationary point.
- Score: 15.31952197599396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Private optimization is a topic of major interest in machine learning, with differentially private stochastic gradient descent (DP-SGD) playing a key role in both theory and practice. Furthermore, DP-SGD is known to be a powerful tool in contexts beyond privacy, including robustness, machine unlearning, etc. Existing analyses of DP-SGD either make relatively strong assumptions (e.g., Lipschitz continuity of the loss function, or even convexity) or prove only first-order convergence (and thus might end at a saddle point in the non-convex setting). At the same time, there has been progress in proving second-order convergence of the non-private version of ``noisy SGD'', as well as progress in designing algorithms that are more complex than DP-SGD and do guarantee second-order convergence. We revisit DP-SGD and show that ``noise is all you need'': the noise necessary for privacy already implies second-order convergence under the standard smoothness assumptions, even for non-Lipschitz loss functions. Hence, we get second-order convergence essentially for free: DP-SGD, the workhorse of modern private optimization, under minimal assumptions can be used to find a second-order stationary point.
Related papers
- Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness [16.303040664382138]
We study the Differential Privacy (DP) guarantee of hidden-state Noisy-SGD algorithms over a bounded domain.
We prove convergent R'enyi DP bound for non-smooth non-smooth losses.
We also provide a strictly better privacy bound compared to state-of-the-art results for smooth convex losses.
arXiv Detail & Related papers (2024-10-01T20:52:08Z) - It's Our Loss: No Privacy Amplification for Hidden State DP-SGD With Non-Convex Loss [0.76146285961466]
We show that for specific loss functions, the final iterate of DP-SGD leaks as much information as the final loss function.
We conclude that no privacy amplification is possible for DP-SGD in general for all (non-) loss functions.
arXiv Detail & Related papers (2024-07-09T01:58:19Z) - How Private are DP-SGD Implementations? [61.19794019914523]
We show that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
arXiv Detail & Related papers (2024-03-26T13:02:43Z) - Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations [6.220757855114254]
Injecting heavy-tailed noise to the iterates of descent (SGD) has received increasing attention over the past few years.
We show that SGD with heavy-tailed perturbations achieves $(0, tildemathcalO (1/n)$ DP guarantees.
We illustrate that the heavy-tailed noising mechanism achieves similar DP guarantees compared to the Gaussian case, which suggests that it can be a viable alternative to its light-tailed counterparts.
arXiv Detail & Related papers (2024-03-04T13:53:41Z) - Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation.
We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC.
We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z) - DPIS: An Enhanced Mechanism for Differentially Private SGD with Importance Sampling [23.8561225168394]
differential privacy (DP) has become a well-accepted standard for privacy protection, and deep neural networks (DNN) have been immensely successful in machine learning.
A classic mechanism for this purpose is DP-SGD, which is a differentially private version of the gradient descent (SGD) commonly used for training.
We propose DPIS, a novel mechanism for differentially private SGD training that can be used as a drop-in replacement of the core of DP-SGD.
arXiv Detail & Related papers (2022-10-18T07:03:14Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Differentially Private SGDA for Minimax Problems [83.57322009102973]
We prove that gradient descent ascent (SGDA) can achieve optimal utility in terms of weak primal-dual population risk.
This is the first-ever-known result for non-smoothly-strongly-concave setting.
arXiv Detail & Related papers (2022-01-22T13:05:39Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.