Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
- URL: http://arxiv.org/abs/2006.06914v1
- Date: Fri, 12 Jun 2020 02:45:21 GMT
- Title: Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
- Authors: Raef Bassily, Vitaly Feldman, Crist\'obal Guzm\'an, Kunal Talwar
- Abstract summary: We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
- Score: 52.039438701530905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Uniform stability is a notion of algorithmic stability that bounds the worst
case change in the model output by the algorithm when a single data point in
the dataset is replaced. An influential work of Hardt et al. (2016) provides
strong upper bounds on the uniform stability of the stochastic gradient descent
(SGD) algorithm on sufficiently smooth convex losses. These results led to
important progress in understanding of the generalization properties of SGD and
several applications to differentially private convex optimization for smooth
losses.
Our work is the first to address uniform stability of SGD on {\em nonsmooth}
convex losses. Specifically, we provide sharp upper and lower bounds for
several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex
losses. Our lower bounds show that, in the nonsmooth case, (S)GD can be
inherently less stable than in the smooth case. On the other hand, our upper
bounds show that (S)GD is sufficiently stable for deriving new and useful
bounds on generalization error. Most notably, we obtain the first
dimension-independent generalization bounds for multi-pass SGD in the nonsmooth
case. In addition, our bounds allow us to derive a new algorithm for
differentially private nonsmooth stochastic convex optimization with optimal
excess population risk. Our algorithm is simpler and more efficient than the
best known algorithm for the nonsmooth case Feldman et al. (2020).
Related papers
- Random Scaling and Momentum for Non-smooth Non-convex Optimization [38.443430569753026]
Training neural networks requires a loss function that may be highly irregular, and in particular neither convex nor smooth.
Popular training algorithms are based on gradient descent with momentum (SGDM), for which analysis applies only if the loss is either convex or smooth.
arXiv Detail & Related papers (2024-05-16T00:52:03Z) - Nonconvex Stochastic Bregman Proximal Gradient Method for Nonconvex Composite Problems [9.202586157819693]
gradient methods for non composite objective functions typically rely on the Lipschitz smoothness of the differentiable part.
We propose a better approximation model that handles non-Lipschitz gradient in non objectives.
We show it achieves optimal robustness in terms of step selection sensitivity.
arXiv Detail & Related papers (2023-06-26T08:54:46Z) - On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache.
This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z) - Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping [40.69950711262191]
We consider differentially private convex optimization for heavy-tailed data with the guarantee of being differentially private (DP)
We establish new convergence results and improved complexity bounds for the proposed algorithm called AClipped-dpSGD for constrained and unconstrained convex problems.
arXiv Detail & Related papers (2022-06-27T01:39:15Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Differentially Private SGDA for Minimax Problems [83.57322009102973]
We prove that gradient descent ascent (SGDA) can achieve optimal utility in terms of weak primal-dual population risk.
This is the first-ever-known result for non-smoothly-strongly-concave setting.
arXiv Detail & Related papers (2022-01-22T13:05:39Z) - Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order.
We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z) - An improved convergence analysis for decentralized online stochastic
non-convex optimization [17.386715847732468]
In this paper, we show that a technique called GT-Loakjasiewics (GT-Loakjasiewics) satisfies the existing condition GT-Loakjasiewics (GT-Loakjasiewics) satisfies the current best convergence rates.
The results are not only immediately applicable but also the currently known best convergence rates.
arXiv Detail & Related papers (2020-08-10T15:29:13Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.