Stochastic Gradient Descent Revisited
- URL: http://arxiv.org/abs/2412.06070v2
- Date: Sun, 09 Feb 2025 20:39:33 GMT
- Title: Stochastic Gradient Descent Revisited
- Authors: Azar Louzi,
- Abstract summary: gradient descent (SGD) has been a go-to algorithm for non gradient convergence problems in machine learning.
This paper presents a full scope convergence theory and provides rates and complexities.
- Score: 0.0
- License:
- Abstract: Stochastic gradient descent (SGD) has been a go-to algorithm for nonconvex stochastic optimization problems arising in machine learning. Its theory however often requires a strong framework to guarantee convergence properties. We hereby present a full scope convergence study of biased nonconvex SGD, including weak convergence, function-value convergence and global convergence, and also provide subsequent convergence rates and complexities, all under relatively mild conditions in comparison with literature.
Related papers
- Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions.
Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z) - Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Distributed Stochastic Optimization under a General Variance Condition [13.911633636387059]
Distributed optimization has drawn great attention recently due to its effectiveness in solving largescale machine learning problems.
We revisit the classical Federated Averaging (Avg) and establish the convergence results under only a mild variance for smooth non objective functions.
Almost a stationary convergence point is also established under the gradients condition.
arXiv Detail & Related papers (2023-01-30T05:48:09Z) - Fixed-Time Convergence for a Class of Nonconvex-Nonconcave Min-Max
Problems [5.787117733071416]
We develop a fixed-time convergent saddle point dynamical system for solving min-max problems.
The proposed method achieves fast compared to any other state method.
arXiv Detail & Related papers (2022-07-26T12:25:05Z) - On the Convergence of mSGD and AdaGrad for Stochastic Optimization [0.696125353550498]
convex descent (SGD) has been intensively developed and extensively applied in machine learning in the past decade.
Some modified SGD-type algorithms, which outperform the SGD in many competitions and applications in terms of convergence rate and accuracy, such as momentum-based SGD (mSGD) and adaptive gradient optimization (AdaGrad)
We focus on convergence analysis of mSGD and AdaGrad for any smooth (possibly non-possibly non-possibly non-possibly) loss functions in machine learning.
arXiv Detail & Related papers (2022-01-26T22:02:21Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and
Interpolation [17.199023009789308]
The Expected assumption of SGD (SGD) is being used routinely for non-artisan functions.
In this paper, we show a paradigms for convergence to a smooth non-linear setting.
We also provide theoretical guarantees for different step-size conditions.
arXiv Detail & Related papers (2020-06-18T07:05:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.