Almost sure convergence rates for Stochastic Gradient Descent and
Stochastic Heavy Ball
- URL: http://arxiv.org/abs/2006.07867v2
- Date: Fri, 5 Feb 2021 13:40:31 GMT
- Title: Almost sure convergence rates for Stochastic Gradient Descent and
Stochastic Heavy Ball
- Authors: Othmane Sebbouh, Robert M. Gower and Aaron Defazio
- Abstract summary: We study gradient descent (SGD) and the heavy ball method (SHB) for the general approximation problem.
For SGD, in the convex and smooth setting, we provide the first emphalmost sure convergence emphrates for a weighted average of the iterates.
- Score: 17.33867778750777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study stochastic gradient descent (SGD) and the stochastic heavy ball
method (SHB, otherwise known as the momentum method) for the general stochastic
approximation problem.
For SGD, in the convex and smooth setting, we provide the first \emph{almost
sure} asymptotic convergence \emph{rates} for a weighted average of the
iterates . More precisely, we show that the convergence rate of the function
values is arbitrarily close to $o(1/\sqrt{k})$, and is exactly $o(1/k)$ in the
so-called overparametrized case. We show that these results still hold when
using stochastic line search and stochastic Polyak stepsizes, thereby giving
the first proof of convergence of these methods in the non-overparametrized
regime.
Using a substantially different analysis, we show that these rates hold for
SHB as well, but at the last iterate. This distinction is important because it
is the last iterate of SGD and SHB which is used in practice. We also show that
the last iterate of SHB converges to a minimizer \emph{almost surely}.
Additionally, we prove that the function values of the deterministic HB
converge at a $o(1/k)$ rate, which is faster than the previously known
$O(1/k)$.
Finally, in the nonconvex setting, we prove similar rates on the lowest
gradient norm along the trajectory of SGD.
Related papers
- Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods [25.831462008050387]
The Gradient Descent (SGD) algorithm has triggered people's interest due to its good performance in practice but lack of theoretical understanding.
It still remains unclear whether the lastiterate convergence can be provably extended to wider composite optimization and non-Euclidean norms.
arXiv Detail & Related papers (2023-12-13T21:41:06Z) - Convergence Rates for Stochastic Approximation: Biased Noise with Unbounded Variance, and Applications [2.0584253077707477]
We study the convergence properties of the Gradient Descent (SGD) method for finding a stationary point of an objective function $J(cdot)$.
Our results apply to a class of invex'' functions, which have the property that every stationary point is also a global minimizer.
arXiv Detail & Related papers (2023-12-05T15:22:39Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - On Almost Sure Convergence Rates of Stochastic Gradient Methods [11.367487348673793]
We show for the first time that the almost sure convergence rates obtained for gradient methods are arbitrarily close to their optimal convergence rates possible.
For non- objective functions, we only show that a weighted average of squared gradient norms converges not almost surely, but also lasts to zero almost surely.
arXiv Detail & Related papers (2022-02-09T06:05:30Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Fast Rates for the Regret of Offline Reinforcement Learning [69.23654172273085]
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinitehorizon discounted decision process (MDP)
We show that given any estimate for the optimal quality function $Q*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q*$-estimate's pointwise convergence rate.
arXiv Detail & Related papers (2021-01-31T16:17:56Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z) - Stochastic gradient-free descents [8.663453034925363]
We propose gradient-free methods and accelerated gradients with momentum for solving optimization problems.
We analyze the convergence behavior of these methods under the mean-variance framework.
arXiv Detail & Related papers (2019-12-31T13:56:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.