Accelerated gradient methods for nonconvex optimization: Escape
trajectories from strict saddle points and convergence to local minima
- URL: http://arxiv.org/abs/2307.07030v1
- Date: Thu, 13 Jul 2023 19:11:07 GMT
- Title: Accelerated gradient methods for nonconvex optimization: Escape
trajectories from strict saddle points and convergence to local minima
- Authors: Rishabh Dixit, Mert Gurbuzbalaban, and Waheed U. Bajwa
- Abstract summary: This paper considers the problem.
of understanding convex behavior of a general general of accelerated gradient methods on.
non-aptotic functions.
It shows that Nesterov's accelerated method (NAG) with variable momentum avoids strict saddle points almost surely.
- Score: 9.66353475649122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers the problem of understanding the behavior of a general
class of accelerated gradient methods on smooth nonconvex functions. Motivated
by some recent works that have proposed effective algorithms, based on Polyak's
heavy ball method and the Nesterov accelerated gradient method, to achieve
convergence to a local minimum of nonconvex functions, this work proposes a
broad class of Nesterov-type accelerated methods and puts forth a rigorous
study of these methods encompassing the escape from saddle-points and
convergence to local minima through a both asymptotic and a non-asymptotic
analysis. In the asymptotic regime, this paper answers an open question of
whether Nesterov's accelerated gradient method (NAG) with variable momentum
parameter avoids strict saddle points almost surely. This work also develops
two metrics of asymptotic rate of convergence and divergence, and evaluates
these two metrics for several popular standard accelerated methods such as the
NAG, and Nesterov's accelerated gradient with constant momentum (NCM) near
strict saddle points. In the local regime, this work provides an analysis that
leads to the "linear" exit time estimates from strict saddle neighborhoods for
trajectories of these accelerated methods as well the necessary conditions for
the existence of such trajectories. Finally, this work studies a sub-class of
accelerated methods that can converge in convex neighborhoods of nonconvex
functions with a near optimal rate to a local minima and at the same time this
sub-class offers superior saddle-escape behavior compared to that of NAG.
Related papers
- Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions.
Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z) - Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Acceleration and Implicit Regularization in Gaussian Phase Retrieval [5.484345596034159]
In this setting, we prove that methods with Polyak or Nesterov momentum implicit regularization ensures a nice convex descent.
Experimental evidence shows that these methods converge faster than gradient descent in practice.
arXiv Detail & Related papers (2023-11-21T04:10:03Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Almost Sure Saddle Avoidance of Stochastic Gradient Methods without the
Bounded Gradient Assumption [11.367487348673793]
We prove that various gradient descent methods, including the gradient descent (SGD), heavy-ball (SHB) and Nesterov's accelerated gradient (SNAG) methods, almost surely avoid any strict saddle manifold.
This is the first time such results are obtained for SHB and SNAG methods.
arXiv Detail & Related papers (2023-02-15T18:53:41Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Acceleration Methods [57.202881673406324]
We first use quadratic optimization problems to introduce two key families of acceleration methods.
We discuss momentum methods in detail, starting with the seminal work of Nesterov.
We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates.
arXiv Detail & Related papers (2021-01-23T17:58:25Z) - Boundary Conditions for Linear Exit Time Gradient Trajectories Around
Saddle Points: Analysis and Algorithm [9.69596041242667]
An understanding of multiple objective functions in a landscape of strict-saddle points is presented.
An analysis of the neighborhoods convergence to a local landscape that has a maximum of strict-saddle points is also presented.
arXiv Detail & Related papers (2021-01-07T16:59:15Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z) - IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method [64.15649345392822]
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex.
Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method.
When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds.
arXiv Detail & Related papers (2020-06-11T18:49:06Z) - Exit Time Analysis for Approximations of Gradient Descent Trajectories
Around Saddle Points [9.66353475649122]
This paper puts forth a rigorous geometric analysis of the gradient-descent method around strict-saddle neighborhoods.
It provides a key result that can be used to generate an approximate gradient trajectory for any given initial conditions.
The analysis leads to a linear exit-time solution for gradient-descent method under certain necessary initial conditions.
arXiv Detail & Related papers (2020-06-01T17:47:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.