A fast and simple modification of Newton's method helping to avoid
saddle points
- URL: http://arxiv.org/abs/2006.01512v4
- Date: Thu, 9 Sep 2021 14:26:11 GMT
- Title: A fast and simple modification of Newton's method helping to avoid
saddle points
- Authors: Tuyen Trung Truong, Tat Dat To, Tuan Hang Nguyen, Thu Hang Nguyen,
Hoang Phuong Nguyen, Maged Helmy
- Abstract summary: This paper roughly says that if $f$ is $C3$ and a sequence $x_n$, then the limit point is a critical point and is not a saddle point, and the convergence rate is the same as that of Newton's method.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose in this paper New Q-Newton's method. The update rule is very
simple conceptually, for example $x_{n+1}=x_n-w_n$ where
$w_n=pr_{A_n,+}(v_n)-pr_{A_n,-}(v_n)$, with $A_n=\nabla ^2f(x_n)+\delta
_n||\nabla f(x_n)||^2.Id$ and $v_n=A_n^{-1}.\nabla f(x_n)$. Here $\delta _n$ is
an appropriate real number so that $A_n$ is invertible, and $pr_{A_n,\pm}$ are
projections to the vector subspaces generated by eigenvectors of positive
(correspondingly negative) eigenvalues of $A_n$.
The main result of this paper roughly says that if $f$ is $C^3$ (can be
unbounded from below) and a sequence $\{x_n\}$, constructed by the New
Q-Newton's method from a random initial point $x_0$, {\bf converges}, then the
limit point is a critical point and is not a saddle point, and the convergence
rate is the same as that of Newton's method. The first author has recently been
successful incorporating Backtracking line search to New Q-Newton's method,
thus resolving the convergence guarantee issue observed for some (non-smooth)
cost functions. An application to quickly finding zeros of a univariate
meromorphic function will be discussed. Various experiments are performed,
against well known algorithms such as BFGS and Adaptive Cubic Regularization
are presented.
Related papers
- A simple and improved algorithm for noisy, convex, zeroth-order optimisation [59.51990161522328]
We construct an algorithm that returns a point $hat xin barmathcal X$ such that $f(hat x)$ is as small as possible.
We prove that this method is such that the $f(hat x) - min_xin barmathcal X f(x)$ is of smaller order than $d2/sqrtn$ up to poly-logarithmic terms.
arXiv Detail & Related papers (2024-06-26T18:19:10Z) - Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - Perseus: A Simple and Optimal High-Order Method for Variational
Inequalities [81.32967242727152]
A VI involves finding $xstar in mathcalX$ such that $langle F(x), x - xstarrangle geq 0$ for all $x in mathcalX$.
We propose a $pth$-order method that does textitnot require any line search procedure and provably converges to a weak solution at a rate of $O(epsilon-2/(p+1))$.
arXiv Detail & Related papers (2022-05-06T13:29:14Z) - Regularized Newton Method with Global $O(1/k^2)$ Convergence [10.685862129925729]
We prove that our method converges superlinearly when the objective is strongly convex.
Our method is the first variant of Newton's method that has both cheap iterations and provably fast global convergence.
arXiv Detail & Related papers (2021-12-03T18:55:50Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - Generalisations and improvements of New Q-Newton's method Backtracking [0.0]
We propose a general framework for the algorithm New Q-Newton's method Backtracking.
In this paper, we allow more flexibility and gradientity, for example $1$ or $e_m(x)$'s are not necessarily eigenvectors of $nabla 2f(x)$.
arXiv Detail & Related papers (2021-09-23T14:28:15Z) - New Q-Newton's method meets Backtracking line search: good convergence
guarantee, saddle points avoidance, quadratic rate of convergence, and easy
implementation [0.0]
modification of Newton's method, named New Q-Newton's method, which can avoid saddle points and has quadratic rate of convergence.
Experiments on small scale show that the method works very competitively against other well known modifications of Newton's method.
arXiv Detail & Related papers (2021-08-23T15:39:00Z) - Statistical Query Lower Bounds for List-Decodable Linear Regression [55.06171096484622]
We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples.
Our main result is a Statistical Query (SQ) lower bound of $dmathrmpoly (1/alpha)$ for this problem.
arXiv Detail & Related papers (2021-06-17T17:45:21Z) - Unconstrained optimisation on Riemannian manifolds [0.0]
We give explicit descriptions of versions of (Local-) Backtracking Gradient Descent and New Q-Newton's method.
We also do explicit calculations for calculating the smallest eigenvalue of a symmetric square matrix.
arXiv Detail & Related papers (2020-08-25T15:10:21Z) - Backtracking Gradient Descent allowing unbounded learning rates [0.0]
In unconstrained optimisation on an Euclidean space, to prove convergence in Gradient Descent processes (GD) $x_n+1=x_n-delta _n nabla f(x_n)$ it usually is required that the learning rates $delta _n$'s are bounded.
In this paper, we allow the learning rates $delta _n$ to be unbounded.
It will be shown that this growth rate of $h$ is best possible if one wants convergence of the sequence $x_n$.
arXiv Detail & Related papers (2020-01-07T12:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.