Revisiting Stochastic Approximation and Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2505.11343v2
- Date: Fri, 13 Jun 2025 14:45:52 GMT
- Title: Revisiting Stochastic Approximation and Stochastic Gradient Descent
- Authors: Rajeeva Laxman Karandikar, Bhamidi Visweswara Rao, Mathukumalli Vidyasagar,
- Abstract summary: We introduce a new approach to proving the convergence of the Approximation (SA) and the Gradient Descent (SGD) algorithms.<n>The new approach is based on a concept called GSLLN (Generalized Strong Law of Large Numbers), which extends the traditional SLLN.
- Score: 2.4092466208212313
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a new approach to proving the convergence of the Stochastic Approximation (SA) and the Stochastic Gradient Descent (SGD) algorithms. The new approach is based on a concept called GSLLN (Generalized Strong Law of Large Numbers), which extends the traditional SLLN. Using this concept, we provide sufficient conditions for convergence, which effectively decouple the properties of the function whose zero we are trying to find, from the properties of the measurement errors (noise sequence). The new approach provides an alternative to the two widely used approaches, namely the ODE approach and the martingale approach, and also permits a wider class of noise signals than either of the two known approaches. In particular, the ``noise'' or measurement error \textit{need not} have a finite second moment, and under suitable conditions, not even a finite mean. By adapting this method of proof, we also derive sufficient conditions for the convergence of zero-order SGD, wherein the stochastic gradient is computed using $2d$ function evaluations, but no gradient computations. The sufficient conditions derived here are the weakest to date, thus leading to a considerable expansion of the applicability of SA and SGD theory.
Related papers
- Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness [50.78508362183774]
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance.<n>Most require the Lipschitz condition, which is often not met in common machine learning schemes.
arXiv Detail & Related papers (2025-07-11T15:36:48Z) - Stochastic Gradient Descent in Non-Convex Problems: Asymptotic Convergence with Relaxed Step-Size via Stopping Time Methods [13.677904140815386]
Gradient Descent (SGD) is widely used in machine learning research.<n>This paper introduces a novel analytical method for convergence analysis of SGD under more relaxed step-size conditions.
arXiv Detail & Related papers (2025-04-17T02:56:20Z) - From Gradient Clipping to Normalization for Heavy Tailed SGD [19.369399536643773]
Recent empirical evidence indicates that machine learning applications involve heavy-tailed noise, which challenges the standard assumptions of bounded variance in practice.<n>In this paper, we show that it is possible to achieve tightness of the gradient-dependent noise convergence problem under tailed noise.
arXiv Detail & Related papers (2024-10-17T17:59:01Z) - Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization [18.47705532817026]
Adaptive gradient methods are among the most successful neural network training algorithms.<n>These methods are known to achieve better dimensional dependence than over convex SGD-normarity.<n>In this paper we introduce new assumptions on the smoothness of the structure and the gradient noise variance.
arXiv Detail & Related papers (2024-06-07T02:55:57Z) - Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions.<n>Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z) - Exact Mean Square Linear Stability Analysis for SGD [28.65663421598186]
We provide an explicit condition on the step size that is both necessary and sufficient for linear stability of gradient descent (SGD)
We show that SGD's stability threshold is equivalent to that of a mixture process which takes in each a full batch gradient step w.p. $1-p$, and a single sample gradient step w.p. $p$.
arXiv Detail & Related papers (2023-06-13T15:29:23Z) - Convergence of the Stochastic Heavy Ball Method With Approximate Gradients and/or Block Updating [0.6445605125467572]
We establish the convergence of the Heavy Ball (SHB) algorithm under more general conditions than in the current literature.<n>Our analysis embraces not only convex functions, but also more general functions that satisfy the PL (Polyak-Lojasiewicz) and KL (Kurdyka-Lojasiewicz) conditions.
arXiv Detail & Related papers (2023-03-28T18:34:52Z) - Minimax Instrumental Variable Regression and $L_2$ Convergence
Guarantees without Identification or Closedness [71.42652863687117]
We study nonparametric estimation of instrumental variable (IV) regressions.
We propose a new penalized minimax estimator that can converge to a fixed IV solution.
We derive a strong $L$ error rate for our estimator under lax conditions.
arXiv Detail & Related papers (2023-02-10T18:08:49Z) - Utilising the CLT Structure in Stochastic Gradient based Sampling :
Improved Analysis and Faster Algorithms [14.174806471635403]
We consider approximations of sampling algorithms, such as Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM) for Interacting Particle Dynamcs (IPD)
We observe that the noise introduced by the approximation is nearly Gaussian due to the Central Limit Theorem (CLT) while the driving Brownian motion is exactly Gaussian.
We harness this structure to absorb the approximation error inside the diffusion process, and obtain improved convergence guarantees for these algorithms.
arXiv Detail & Related papers (2022-06-08T10:17:40Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z) - Convergence rates and approximation results for SGD and its
continuous-time counterpart [16.70533901524849]
This paper proposes a thorough theoretical analysis of convex Gradient Descent (SGD) with non-increasing step sizes.
First, we show that the SGD can be provably approximated by solutions of inhomogeneous Differential Equation (SDE) using coupling.
Recent analyses of deterministic and optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and non-asymptotic bounds.
arXiv Detail & Related papers (2020-04-08T18:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.