Convergence of Gradient Algorithms for Nonconvex $C^{1+\alpha}$ Cost
Functions
- URL: http://arxiv.org/abs/2012.00628v1
- Date: Tue, 1 Dec 2020 16:48:59 GMT
- Title: Convergence of Gradient Algorithms for Nonconvex $C^{1+\alpha}$ Cost
Functions
- Authors: Zixuan Wang and Shanjian Tang
- Abstract summary: We show a byproduct that a gradient converges and provide an explicit upper bound on the convergence rate.
This allows us to prove the almost sure convergence by Doobmartin's supergale convergence theorem.
- Score: 2.538209532048867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper is concerned with convergence of stochastic gradient algorithms
with momentum terms in the nonconvex setting. A class of stochastic momentum
methods, including stochastic gradient descent, heavy ball, and Nesterov's
accelerated gradient, is analyzed in a general framework under quite mild
assumptions. We show that the expected gradient converges and provide an
explicit upper bound on the convergence rate. Then a supermartingale can be
constructed by proper approximations of the noise and momentum terms. This
allows us to prove the almost sure convergence by Doob's supermartingale
convergence theorem and a discussion of the number of upcrossings in detail. It
is worth noting that the existing Lipschitz condition of the gradient of the
objective function is relaxed into the condition of H\"older continuity.
Another improvement is that there are no additional restrictions imposed on
stepsizes. As a byproduct, we apply a localization procedure to extend our
results to stochastic stepsizes.
Related papers
- Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions.
Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - On Almost Sure Convergence Rates of Stochastic Gradient Methods [11.367487348673793]
We show for the first time that the almost sure convergence rates obtained for gradient methods are arbitrarily close to their optimal convergence rates possible.
For non- objective functions, we only show that a weighted average of squared gradient norms converges not almost surely, but also lasts to zero almost surely.
arXiv Detail & Related papers (2022-02-09T06:05:30Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Almost sure convergence rates for Stochastic Gradient Descent and
Stochastic Heavy Ball [17.33867778750777]
We study gradient descent (SGD) and the heavy ball method (SHB) for the general approximation problem.
For SGD, in the convex and smooth setting, we provide the first emphalmost sure convergence emphrates for a weighted average of the iterates.
arXiv Detail & Related papers (2020-06-14T11:12:05Z) - The Convergence Indicator: Improved and completely characterized
parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge.
Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z) - On the Convergence of Nesterov's Accelerated Gradient Method in
Stochastic Settings [5.101801159418223]
We study Nesterov's accelerated gradient method with constant step-size and momentum parameters.
In the finite-sum setting, we prove that Nesterov's method may diverge with the usual choice of step-size and momentum.
arXiv Detail & Related papers (2020-02-27T19:56:41Z) - Stochastic gradient-free descents [8.663453034925363]
We propose gradient-free methods and accelerated gradients with momentum for solving optimization problems.
We analyze the convergence behavior of these methods under the mean-variance framework.
arXiv Detail & Related papers (2019-12-31T13:56:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.