Large deviations rates for stochastic gradient descent with strongly
convex functions
- URL: http://arxiv.org/abs/2211.00969v1
- Date: Wed, 2 Nov 2022 09:15:26 GMT
- Title: Large deviations rates for stochastic gradient descent with strongly
convex functions
- Authors: Dragana Bajovic, Dusan Jakovetic, Soummya Kar
- Abstract summary: We provide a formal framework for the study of general high probability bounds with gradient descent.
We find an upper large deviations bound for SGD with strongly convex functions.
- Score: 11.247580943940916
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works have shown that high probability metrics with stochastic
gradient descent (SGD) exhibit informativeness and in some cases advantage over
the commonly adopted mean-square error-based ones. In this work we provide a
formal framework for the study of general high probability bounds with SGD,
based on the theory of large deviations. The framework allows for a generic
(not-necessarily bounded) gradient noise satisfying mild technical assumptions,
allowing for the dependence of the noise distribution on the current iterate.
Under the preceding assumptions, we find an upper large deviations bound for
SGD with strongly convex functions. The corresponding rate function captures
analytical dependence on the noise distribution and other problem parameters.
This is in contrast with conventional mean-square error analysis that captures
only the noise dependence through the variance and does not capture the effect
of higher order moments nor interplay between the noise geometry and the shape
of the cost function. We also derive exact large deviation rates for the case
when the objective function is quadratic and show that the obtained function
matches the one from the general upper bound hence showing the tightness of the
general upper bound. Numerical examples illustrate and corroborate theoretical
findings.
Related papers
- High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions [13.431453056203226]
Heavy-tail phenomena in Wasserstein descent (SGD) have been reported several empirical observations.
This paper develops bounds for generalization functions as well as general gradient functions.
Very recently, they shed more light to the empirical observations, thanks to the generality of the loss functions.
arXiv Detail & Related papers (2023-01-27T17:57:35Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Global Convergence and Stability of Stochastic Gradient Descent [0.0]
We show that SGD converges to a stationary point under nearly arbitrary non-ity and noise models.
We show that SGD can be applied to a range of problems with global convergence of confidence.
arXiv Detail & Related papers (2021-10-04T19:00:50Z) - Deconfounded Score Method: Scoring DAGs with Dense Unobserved
Confounding [101.35070661471124]
We show that unobserved confounding leaves a characteristic footprint in the observed data distribution that allows for disentangling spurious and causal effects.
We propose an adjusted score-based causal discovery algorithm that may be implemented with general-purpose solvers and scales to high-dimensional problems.
arXiv Detail & Related papers (2021-03-28T11:07:59Z) - On the Convergence of SGD with Biased Gradients [28.400751656818215]
We analyze the guiding domain of biased gradient methods (SGD), where individual updates are corrupted by compression.
We quantify how many magnitudes of bias accuracy and convergence rates are impacted.
arXiv Detail & Related papers (2020-07-31T19:37:59Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.