Global Convergence and Stability of Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2110.01663v1
- Date: Mon, 4 Oct 2021 19:00:50 GMT
- Title: Global Convergence and Stability of Stochastic Gradient Descent
- Authors: Vivak Patel, Bowen Tian, Shushu Zhang
- Abstract summary: We show that SGD converges to a stationary point under nearly arbitrary non-ity and noise models.
We show that SGD can be applied to a range of problems with global convergence of confidence.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In machine learning, stochastic gradient descent (SGD) is widely deployed to
train models using highly non-convex objectives with equally complex noise
models. Unfortunately, SGD theory often makes restrictive assumptions that fail
to capture the non-convexity of real problems, and almost entirely ignore the
complex noise models that exist in practice. In this work, we make substantial
progress on this shortcoming. First, we establish that SGD's iterates will
either globally converge to a stationary point or diverge under nearly
arbitrary nonconvexity and noise models. Under a slightly more restrictive
assumption on the joint behavior of the non-convexity and noise model that
generalizes current assumptions in the literature, we show that the objective
function cannot diverge, even if the iterates diverge. As a consequence of our
results, SGD can be applied to a greater range of stochastic optimization
problems with confidence about its global convergence behavior and stability.
Related papers
- Doubly Stochastic Models: Learning with Unbiased Label Noises and
Inference Stability [85.1044381834036]
We investigate the implicit regularization effects of label noises under mini-batch sampling settings of gradient descent.
We find such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters.
Our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more general result with convergence of approximation proved.
arXiv Detail & Related papers (2023-04-01T14:09:07Z) - Large deviations rates for stochastic gradient descent with strongly
convex functions [11.247580943940916]
We provide a formal framework for the study of general high probability bounds with gradient descent.
We find an upper large deviations bound for SGD with strongly convex functions.
arXiv Detail & Related papers (2022-11-02T09:15:26Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Tackling benign nonconvexity with smoothing and stochastic gradients [28.197694894254305]
Non- optimization problems are common in machine learning, especially in Deep Learning.
In this paper we show how gradient descent can be used to solve such problems.
arXiv Detail & Related papers (2022-02-18T07:27:32Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Stochastic Training is Not Necessary for Generalization [57.04880404584737]
It is widely believed that the implicit regularization of gradient descent (SGD) is fundamental to the impressive generalization behavior we observe in neural networks.
In this work, we demonstrate that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD.
arXiv Detail & Related papers (2021-09-29T00:50:00Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.