Inertial Newton Algorithms Avoiding Strict Saddle Points
- URL: http://arxiv.org/abs/2111.04596v2
- Date: Mon, 12 Feb 2024 10:00:23 GMT
- Title: Inertial Newton Algorithms Avoiding Strict Saddle Points
- Authors: Camille Castera
- Abstract summary: We study the behavior of second-order algorithms mixing Newton's method and inertial gradient descent.
We show that, Newtonian behavior of these methods, they almost always escape strict points.
- Score: 0.7614628596146602
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the asymptotic behavior of second-order algorithms mixing Newton's
method and inertial gradient descent in non-convex landscapes. We show that,
despite the Newtonian behavior of these methods, they almost always escape
strict saddle points. We also evidence the role played by the hyper-parameters
of these methods in their qualitative behavior near critical points. The
theoretical results are supported by numerical illustrations.
Related papers
- A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Newton-CG methods for nonconvex unconstrained optimization with H\"older
continuous Hessian [5.161026048499362]
We consider a nonconstrained unconstrained optimization problem minimizing a twice differentiable objective function with H"older Hessian.
We develop a parameter-free NewtonCG method without requiring any prior knowledge of these parameters.
We present preliminary numerical results to demonstrate the superior practical performance over a well-known regularized Newton method.
arXiv Detail & Related papers (2023-11-22T01:50:43Z) - Accelerated gradient methods for nonconvex optimization: Escape
trajectories from strict saddle points and convergence to local minima [9.66353475649122]
This paper considers the problem.
of understanding convex behavior of a general general of accelerated gradient methods on.
non-aptotic functions.
It shows that Nesterov's accelerated method (NAG) with variable momentum avoids strict saddle points almost surely.
arXiv Detail & Related papers (2023-07-13T19:11:07Z) - Understanding Accelerated Gradient Methods: Lyapunov Analyses and
Hamiltonian Assisted Interpretations [1.0152838128195465]
We formulate two classes of first-order algorithms more general than previously studied for minimizing smooth and strongly convex functions.
We establish sufficient conditions, via new discrete Lyapunov analyses, for achieving accelerated convergence rates which match Nesterov's methods in the strongly and general convex settings.
We propose a novel class of discrete algorithms, called the Hamiltonian assisted gradient method, directly based on a Hamiltonian function and several interpretable operations.
arXiv Detail & Related papers (2023-04-20T03:03:30Z) - Stochastic noise can be helpful for variational quantum algorithms [8.8226732803857]
Saddle points constitute a crucial challenge for first-order gradient descent algorithms.
We provide evidence that the saddle points problem can be naturally avoided in variational algorithms by exploiting the presence of quantumity.
arXiv Detail & Related papers (2022-10-13T04:39:41Z) - Point Cloud Denoising via Momentum Ascent in Gradient Fields [72.93429911044903]
gradient-based method was proposed to estimate the gradient fields from the noisy point clouds using neural networks.
We develop a momentum gradient ascent method that leverages the information of previous iterations in determining the trajectories of the points.
Experiments demonstrate that the proposed method outperforms state-of-the-art approaches with a variety of point clouds, noise types, and noise levels.
arXiv Detail & Related papers (2022-02-21T10:21:40Z) - On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes.
We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z) - Stability and Generalization of Stochastic Gradient Methods for Minimax
Problems [71.60601421935844]
Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs)
We provide a comprehensive generalization analysis of examples from training gradient methods for minimax problems.
arXiv Detail & Related papers (2021-05-08T22:38:00Z) - Discriminative Bayesian filtering lends momentum to the stochastic
Newton method for minimizing log-convex functions [0.0]
We show how the Newton method iteratively updates its estimate using subsampled versions of gradient and Hessian versions.
Applying Bayesian filtering, we consider the entire history of observations.
We establish matrix-based conditions under which the effect of older observations diminishes.
We illustrate various aspects of our approach with an example and other innovations for the Newton method.
arXiv Detail & Related papers (2021-04-27T02:39:21Z) - Stochastic Hamiltonian Gradient Methods for Smooth Games [51.47367707388402]
We focus on the class of Hamiltonian methods and provide the first convergence guarantees for certain classes of smooth games.
Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a gradient.
Our results provide the first global non-asymotic last-rate convergence guarantees for the class of general games.
arXiv Detail & Related papers (2020-07-08T15:42:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.