Related papers: New Q-Newton's method meets Backtracking line search: good convergence guarantee, saddle points avoidance, quadratic rate of convergence, and easy implementation

New Q-Newton's method meets Backtracking line search: good convergence guarantee, saddle points avoidance, quadratic rate of convergence, and easy implementation

URL: http://arxiv.org/abs/2108.10249v1
Date: Mon, 23 Aug 2021 15:39:00 GMT
Title: New Q-Newton's method meets Backtracking line search: good convergence guarantee, saddle points avoidance, quadratic rate of convergence, and easy implementation
Authors: Tuyen Trung Truong
Abstract summary: modification of Newton's method, named New Q-Newton's method, which can avoid saddle points and has quadratic rate of convergence. Experiments on small scale show that the method works very competitively against other well known modifications of Newton's method.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In a recent joint work, the author has developed a modification of Newton's method, named New Q-Newton's method, which can avoid saddle points and has quadratic rate of convergence. While good theoretical convergence guarantee has not been established for this method, experiments on small scale problems show that the method works very competitively against other well known modifications of Newton's method such as Adaptive Cubic Regularization and BFGS, as well as first order methods such as Unbounded Two-way Backtracking Gradient Descent. In this paper, we resolve the convergence guarantee issue by proposing a modification of New Q-Newton's method, named New Q-Newton's method Backtracking, which incorporates a more sophisticated use of hyperparameters and a Backtracking line search. This new method has very good theoretical guarantees, which for a {\bf Morse function} yields the following (which is unknown for New Q-Newton's method): {\bf Theorem.} Let $f:\mathbb{R}^m\rightarrow \mathbb{R}$ be a Morse function, that is all its critical points have invertible Hessian. Then for a sequence $\{x_n\}$ constructed by New Q-Newton's method Backtracking from a random initial point $x_0$, we have the following two alternatives: i) $\lim _{n\rightarrow\infty}||x_n||=\infty$, or ii) $\{x_n\}$ converges to a point $x_{\infty}$ which is a {\bf local minimum} of $f$, and the rate of convergence is {\bf quadratic}. Moreover, if $f$ has compact sublevels, then only case ii) happens. As far as we know, for Morse functions, this is the best theoretical guarantee for iterative optimization algorithms so far in the literature. We have tested in experiments on small scale, with some further simplified versions of New Q-Newton's method Backtracking, and found that the new method significantly improve New Q-Newton's method.

Related papers

Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization [0.5120567378386615]
First-order descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning. However, these methods do not exploit curvature information. Quasi-Newton methods re-use previously computed low Hessian approximations.
arXiv Detail & Related papers (2025-02-17T20:20:11Z)
A simple and improved algorithm for noisy, convex, zeroth-order optimisation [59.51990161522328]
We construct an algorithm that returns a point $hat xin barmathcal X$ such that $f(hat x)$ is as small as possible. We prove that this method is such that the $f(hat x) - min_xin barmathcal X f(x)$ is of smaller order than $d2/sqrtn$ up to poly-logarithmic terms.
arXiv Detail & Related papers (2024-06-26T18:19:10Z)
Incremental Quasi-Newton Methods with Faster Superlinear Convergence Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian. The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate. This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z)
Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate [83.3933097134767]
We introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of $Oleft(frac1mk+frac1k2right)$. Our method converges faster than existing random subspace methods, especially for high-dimensional problems.
arXiv Detail & Related papers (2024-01-05T20:24:18Z)
Backtracking New Q-Newton's method, Newton's flow, Voronoi's diagram and Stochastic root finding [0.0]
A new variant of Newton's method - named Backtracking New Q-Newton's method (BNQN) - has strong theoretical guarantee, is easy to implement, and has good experimental performance.
arXiv Detail & Related papers (2024-01-02T15:37:47Z)
Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian [53.044526424637866]
We consider a nonconstrained unconstrained optimization problem minimizing a twice differentiable objective function with H"older Hessian. We first propose a Newton-conjugate (Newton-CG) method for finding an approximate first- and second-order stationary point. We present preliminary numerical results to demonstrate the superior practical performance of our parameter-free NewtonCG method.
arXiv Detail & Related papers (2023-11-22T01:50:43Z)
Sketch-and-Project Meets Newton Method: Global $\mathcal O(k^{-2})$ Convergence with Low-Rank Updates [1.592855981846993]
We propose the first sketch-and-project Newton method with fast $mathcal O(k-2)$ global convergence rate for self-concordant functions. SGN inherits best of all three worlds: cheap iteration costs of sketch-and-project methods, state-of-the-art $mathcal O(k-2)$ global convergence rate of full-rank Newton-like methods and the algorithm simplicity of damped Newton methods.
arXiv Detail & Related papers (2023-05-22T14:51:40Z)
Regularized Newton Method with Global $O(1/k^2)$ Convergence [10.685862129925729]
We prove that our method converges superlinearly when the objective is strongly convex. Our method is the first variant of Newton's method that has both cheap iterations and provably fast global convergence.
arXiv Detail & Related papers (2021-12-03T18:55:50Z)
Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z)
Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z)
Unconstrained optimisation on Riemannian manifolds [0.0]
We give explicit descriptions of versions of (Local-) Backtracking Gradient Descent and New Q-Newton's method. We also do explicit calculations for calculating the smallest eigenvalue of a symmetric square matrix.
arXiv Detail & Related papers (2020-08-25T15:10:21Z)
A fast and simple modification of Newton's method helping to avoid saddle points [0.0]
This paper roughly says that if $f$ is $C3$ and a sequence $x_n$, then the limit point is a critical point and is not a saddle point, and the convergence rate is the same as that of Newton's method.
arXiv Detail & Related papers (2020-06-02T10:38:00Z)
A Newton Frank-Wolfe Method for Constrained Self-Concordant Minimization [60.90222082871258]
We demonstrate how to scalably solve a class of constrained self-concordant minimization problems using linear minimization oracles (LMO) over the constraint set. We prove that the number of LMO calls of our method is nearly the same as that of the Frank-Wolfe method in the L-smooth case.
arXiv Detail & Related papers (2020-02-17T15:28:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.