The Instability of Accelerated Gradient Descent
- URL: http://arxiv.org/abs/2102.02167v1
- Date: Wed, 3 Feb 2021 17:50:42 GMT
- Title: The Instability of Accelerated Gradient Descent
- Authors: Amit Attia and Tomer Koren
- Abstract summary: We study the algorithmic stability of Nesterov's accelerated gradient method.
For convex quadratic objectives, citetchen 2018stability proved that the uniform stability of the method grows quadratically with the number of optimization steps.
We disprove this conjecture and show, for two notions of stability, that the stability of Nesterov's accelerated method in fact deteriorates emphexponentially fast with the number of gradient steps.
- Score: 26.613126943532357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the algorithmic stability of Nesterov's accelerated gradient method.
For convex quadratic objectives, \citet{chen2018stability} proved that the
uniform stability of the method grows quadratically with the number of
optimization steps, and conjectured that the same is true for the general
convex and smooth case. We disprove this conjecture and show, for two notions
of stability, that the stability of Nesterov's accelerated method in fact
deteriorates \emph{exponentially fast} with the number of gradient steps. This
stands in sharp contrast to the bounds in the quadratic case, but also to known
results for non-accelerated gradient methods where stability typically grows
linearly with the number of steps.
Related papers
- Hessian stability and convergence rates for entropic and Sinkhorn potentials via semiconcavity [4.604003661048267]
This is the first work addressing this second-order quantitative stability estimate in general unbounded settings.
We deduce exponential convergence rates for gradient and Hessian of Sinkhorns along Sinkhorn's algorithm.
arXiv Detail & Related papers (2025-04-15T12:34:09Z) - Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions.
Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z) - Acceleration and Implicit Regularization in Gaussian Phase Retrieval [5.484345596034159]
In this setting, we prove that methods with Polyak or Nesterov momentum implicit regularization ensures a nice convex descent.
Experimental evidence shows that these methods converge faster than gradient descent in practice.
arXiv Detail & Related papers (2023-11-21T04:10:03Z) - Accelerated gradient methods for nonconvex optimization: Escape
trajectories from strict saddle points and convergence to local minima [9.66353475649122]
This paper considers the problem.
of understanding convex behavior of a general general of accelerated gradient methods on.
non-aptotic functions.
It shows that Nesterov's accelerated method (NAG) with variable momentum avoids strict saddle points almost surely.
arXiv Detail & Related papers (2023-07-13T19:11:07Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - A Continuized View on Nesterov Acceleration for Stochastic Gradient
Descent and Randomized Gossip [23.519404358187156]
We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.
We show that the discretization has the same structure as Nesterov acceleration, but with random parameters.
We provide the first rigorous acceleration of asynchronous gossip algorithms.
arXiv Detail & Related papers (2021-06-10T08:35:55Z) - Stability and Convergence of Stochastic Gradient Clipping: Beyond
Lipschitz Continuity and Smoothness [23.22461721824713]
Gradient clipping is a technique to stabilize the training process for problems prone to the exploding gradient problem.
This paper establishes both qualitative and quantitative results of the gradient clipped (sub)gradient method (SGD) for non-smooth convex functions.
We also study the convergence of a clipped method with momentum, which includes clipped SGD as a special case.
arXiv Detail & Related papers (2021-02-12T12:41:42Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.