On the Convergence of Nesterov's Accelerated Gradient Method in
Stochastic Settings
- URL: http://arxiv.org/abs/2002.12414v2
- Date: Sat, 27 Jun 2020 20:01:59 GMT
- Title: On the Convergence of Nesterov's Accelerated Gradient Method in
Stochastic Settings
- Authors: Mahmoud Assran and Michael Rabbat
- Abstract summary: We study Nesterov's accelerated gradient method with constant step-size and momentum parameters.
In the finite-sum setting, we prove that Nesterov's method may diverge with the usual choice of step-size and momentum.
- Score: 5.101801159418223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study Nesterov's accelerated gradient method with constant step-size and
momentum parameters in the stochastic approximation setting (unbiased gradients
with bounded variance) and the finite-sum setting (where randomness is due to
sampling mini-batches). To build better insight into the behavior of Nesterov's
method in stochastic settings, we focus throughout on objectives that are
smooth, strongly-convex, and twice continuously differentiable. In the
stochastic approximation setting, Nesterov's method converges to a neighborhood
of the optimal point at the same accelerated rate as in the deterministic
setting. Perhaps surprisingly, in the finite-sum setting, we prove that
Nesterov's method may diverge with the usual choice of step-size and momentum,
unless additional conditions on the problem related to conditioning and data
coherence are satisfied. Our results shed light as to why Nesterov's method may
fail to converge or achieve acceleration in the finite-sum setting.
Related papers
- Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates [61.091122503406304]
We show that the gradient bandit algorithm converges to a globally optimal policy almost surely using emphany constant learning rate.
This result demonstrates that gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down.
arXiv Detail & Related papers (2025-02-11T00:12:04Z) - Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Accelerated gradient methods for nonconvex optimization: Escape
trajectories from strict saddle points and convergence to local minima [9.66353475649122]
This paper considers the problem.
of understanding convex behavior of a general general of accelerated gradient methods on.
non-aptotic functions.
It shows that Nesterov's accelerated method (NAG) with variable momentum avoids strict saddle points almost surely.
arXiv Detail & Related papers (2023-07-13T19:11:07Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - A Continuized View on Nesterov Acceleration for Stochastic Gradient
Descent and Randomized Gossip [23.519404358187156]
We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.
We show that the discretization has the same structure as Nesterov acceleration, but with random parameters.
We provide the first rigorous acceleration of asynchronous gossip algorithms.
arXiv Detail & Related papers (2021-06-10T08:35:55Z) - Convergence of Gradient Algorithms for Nonconvex $C^{1+\alpha}$ Cost
Functions [2.538209532048867]
We show a byproduct that a gradient converges and provide an explicit upper bound on the convergence rate.
This allows us to prove the almost sure convergence by Doobmartin's supergale convergence theorem.
arXiv Detail & Related papers (2020-12-01T16:48:59Z) - Stochastic gradient-free descents [8.663453034925363]
We propose gradient-free methods and accelerated gradients with momentum for solving optimization problems.
We analyze the convergence behavior of these methods under the mean-variance framework.
arXiv Detail & Related papers (2019-12-31T13:56:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.