Related papers: The Instability of Accelerated Gradient Descent

The Instability of Accelerated Gradient Descent

URL: http://arxiv.org/abs/2102.02167v1
Date: Wed, 3 Feb 2021 17:50:42 GMT
Title: The Instability of Accelerated Gradient Descent
Authors: Amit Attia and Tomer Koren
Abstract summary: We study the algorithmic stability of Nesterov's accelerated gradient method. For convex quadratic objectives, citetchen 2018stability proved that the uniform stability of the method grows quadratically with the number of optimization steps. We disprove this conjecture and show, for two notions of stability, that the stability of Nesterov's accelerated method in fact deteriorates emphexponentially fast with the number of gradient steps.
Score: 26.613126943532357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the algorithmic stability of Nesterov's accelerated gradient method. For convex quadratic objectives, \citet{chen2018stability} proved that the uniform stability of the method grows quadratically with the number of optimization steps, and conjectured that the same is true for the general convex and smooth case. We disprove this conjecture and show, for two notions of stability, that the stability of Nesterov's accelerated method in fact deteriorates \emph{exponentially fast} with the number of gradient steps. This stands in sharp contrast to the bounds in the quadratic case, but also to known results for non-accelerated gradient methods where stability typically grows linearly with the number of steps.

Related papers

Hessian stability and convergence rates for entropic and Sinkhorn potentials via semiconcavity [4.604003661048267]
This is the first work addressing this second-order quantitative stability estimate in general unbounded settings. We deduce exponential convergence rates for gradient and Hessian of Sinkhorns along Sinkhorn's algorithm.
arXiv Detail & Related papers (2025-04-15T12:34:09Z)
Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z)
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions. Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z)
Acceleration and Implicit Regularization in Gaussian Phase Retrieval [5.484345596034159]
In this setting, we prove that methods with Polyak or Nesterov momentum implicit regularization ensures a nice convex descent. Experimental evidence shows that these methods converge faster than gradient descent in practice.
arXiv Detail & Related papers (2023-11-21T04:10:03Z)
Accelerated gradient methods for nonconvex optimization: Escape trajectories from strict saddle points and convergence to local minima [9.66353475649122]
This paper considers the problem. of understanding convex behavior of a general general of accelerated gradient methods on. non-aptotic functions. It shows that Nesterov's accelerated method (NAG) with variable momentum avoids strict saddle points almost surely.
arXiv Detail & Related papers (2023-07-13T19:11:07Z)
Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys. Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip [23.519404358187156]
We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide the first rigorous acceleration of asynchronous gossip algorithms.
arXiv Detail & Related papers (2021-06-10T08:35:55Z)
Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness [23.22461721824713]
Gradient clipping is a technique to stabilize the training process for problems prone to the exploding gradient problem. This paper establishes both qualitative and quantitative results of the gradient clipped (sub)gradient method (SGD) for non-smooth convex functions. We also study the convergence of a clipped method with momentum, which includes clipped SGD as a special case.
arXiv Detail & Related papers (2021-02-12T12:41:42Z)
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting. To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients. We prove new complexity that outperform state-of-the-art results in this case. We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.