Regularized Newton Method with Global $O(1/k^2)$ Convergence
- URL: http://arxiv.org/abs/2112.02089v1
- Date: Fri, 3 Dec 2021 18:55:50 GMT
- Title: Regularized Newton Method with Global $O(1/k^2)$ Convergence
- Authors: Konstantin Mishchenko
- Abstract summary: We prove that our method converges superlinearly when the objective is strongly convex.
Our method is the first variant of Newton's method that has both cheap iterations and provably fast global convergence.
- Score: 10.685862129925729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a Newton-type method that converges fast from any initialization
and for arbitrary convex objectives with Lipschitz Hessians. We achieve this by
merging the ideas of cubic regularization with a certain adaptive
Levenberg--Marquardt penalty. In particular, we show that the iterates given by
$x^{k+1}=x^k - \bigl(\nabla^2 f(x^k) + \sqrt{H\|\nabla f(x^k)\|}
\mathbf{I}\bigr)^{-1}\nabla f(x^k)$, where $H>0$ is a constant, converge
globally with a $\mathcal{O}(\frac{1}{k^2})$ rate. Our method is the first
variant of Newton's method that has both cheap iterations and provably fast
global convergence. Moreover, we prove that locally our method converges
superlinearly when the objective is strongly convex. To boost the method's
performance, we present a line search procedure that does not need
hyperparameters and is provably efficient.
Related papers
- On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - Higher-Order Newton Methods with Polynomial Work per Iteration [0.7568448369029973]
We present generalizations of oracles method that incorporate derivatives of an arbitrary $d$ but maintain a dependence on basins in their iteration dimension.
We show on numerical examples that of attraction around local as $d$ can get larger around local as additional assumptions are made.
arXiv Detail & Related papers (2023-11-10T20:02:58Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth
Convex Optimization [26.328847475942894]
We prove that our method can achieve a convergence rate of $Obigl(minfrac1k2, fracsqrtdlog kk2.5bigr)$, where $d$ is the problem dimension and $k$ is the number of iterations.
To the best of our knowledge, this result is the first to demonstrate a provable gain of a quasi-Newton-type method over Nesterov's accelerated gradient.
arXiv Detail & Related papers (2023-06-03T23:31:27Z) - Convergence of a Normal Map-based Prox-SGD Method under the KL
Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z) - Perseus: A Simple and Optimal High-Order Method for Variational
Inequalities [81.32967242727152]
A VI involves finding $xstar in mathcalX$ such that $langle F(x), x - xstarrangle geq 0$ for all $x in mathcalX$.
We propose a $pth$-order method that does textitnot require any line search procedure and provably converges to a weak solution at a rate of $O(epsilon-2/(p+1))$.
arXiv Detail & Related papers (2022-05-06T13:29:14Z) - Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax
Optimization [47.27237492375659]
We study the bilinearly coupled minimax problem: $min_x max_y f(x) + ytop A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions.
No known first-order algorithms have hitherto achieved the lower complexity bound of $Omega(sqrtfracL_xmu_x + frac|A|sqrtmu_x,mu_y) log(frac1vareps
arXiv Detail & Related papers (2022-01-19T05:56:19Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - A fast and simple modification of Newton's method helping to avoid
saddle points [0.0]
This paper roughly says that if $f$ is $C3$ and a sequence $x_n$, then the limit point is a critical point and is not a saddle point, and the convergence rate is the same as that of Newton's method.
arXiv Detail & Related papers (2020-06-02T10:38:00Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.