Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic
Superlinear Convergence Rate
- URL: http://arxiv.org/abs/2306.15444v2
- Date: Wed, 18 Oct 2023 17:21:28 GMT
- Title: Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic
Superlinear Convergence Rate
- Authors: Zhan Gao and Aryan Mokhtari and Alec Koppel
- Abstract summary: We present a Limited-memory Greedy BFGS (LG-BFGS) method that can achieve an explicit non-asymptotic superlinear rate.
Our established non-asymptotic superlinear convergence rate demonstrates an explicit trade-off between the convergence speed and memory requirement.
- Score: 37.49160762239869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-asymptotic convergence analysis of quasi-Newton methods has gained
attention with a landmark result establishing an explicit local superlinear
rate of O$((1/\sqrt{t})^t)$. The methods that obtain this rate, however,
exhibit a well-known drawback: they require the storage of the previous Hessian
approximation matrix or all past curvature information to form the current
Hessian inverse approximation. Limited-memory variants of quasi-Newton methods
such as the celebrated L-BFGS alleviate this issue by leveraging a limited
window of past curvature information to construct the Hessian inverse
approximation. As a result, their per iteration complexity and storage
requirement is O$(\tau d)$ where $\tau\le d$ is the size of the window and $d$
is the problem dimension reducing the O$(d^2)$ computational cost and memory
requirement of standard quasi-Newton methods. However, to the best of our
knowledge, there is no result showing a non-asymptotic superlinear convergence
rate for any limited-memory quasi-Newton method. In this work, we close this
gap by presenting a Limited-memory Greedy BFGS (LG-BFGS) method that can
achieve an explicit non-asymptotic superlinear rate. We incorporate
displacement aggregation, i.e., decorrelating projection, in post-processing
gradient variations, together with a basis vector selection scheme on variable
variations, which greedily maximizes a progress measure of the Hessian estimate
to the true Hessian. Their combination allows past curvature information to
remain in a sparse subspace while yielding a valid representation of the full
history. Interestingly, our established non-asymptotic superlinear convergence
rate demonstrates an explicit trade-off between the convergence speed and
memory requirement, which to our knowledge, is the first of its kind. Numerical
results corroborate our theoretical findings and demonstrate the effectiveness
of our method.
Related papers
- Online Learning Guided Quasi-Newton Methods with Global Non-Asymptotic Convergence [20.766358513158206]
We prove a global convergence rate of $O(min1/k,sqrtd/k1.25)$ in terms of the duality gap.
These results are the first global convergence results to demonstrate a provable advantage of a quasi-Newton method over the extragradient method.
arXiv Detail & Related papers (2024-10-03T16:08:16Z) - Incremental Gauss--Newton Methods with Superlinear Convergence Rates [16.92437325972209]
We introduce a novel Incremental Gauss--Newton (IGN) method within explicit superlinear convergence rate.
In particular, we formulate our problem by the nonlinear least squares with finite-sum structure.
We also provide a mini-batch extension to our IGN method that obtains an even faster superlinear convergence rate.
arXiv Detail & Related papers (2024-07-03T15:26:34Z) - Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
We show that wetailed high-prob convergence guarantees of learning on streaming data in the presence of heavy-tailed noise.
We demonstrate analytically and that $ta$ can be used to the preferred choice of setting for a given problem.
arXiv Detail & Related papers (2023-10-28T18:53:41Z) - Online Learning Guided Curvature Approximation: A Quasi-Newton Method
with Global Non-Asymptotic Superlinear Convergence [22.286753988825048]
We present the first globally convergent quasi-Newton method with an explicit non-asymptotic superlinear convergence rate.
Unlike classical quasi-Newton methods, we build our algorithm upon the hybrid proximal extragradient method.
arXiv Detail & Related papers (2023-02-16T20:58:09Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - Consistent Online Gaussian Process Regression Without the Sample
Complexity Bottleneck [14.309243378538012]
We propose an online compression scheme that fixes an error neighborhood with respect to the Hellinger metric centered at the current posterior.
For constant error radius, POG converges to a neighborhood of the population posterior (Theorem 1(ii))but with finite memory at-worst determined by the metric entropy of the feature space.
arXiv Detail & Related papers (2020-04-23T11:52:06Z) - Non-asymptotic Superlinear Convergence of Standard Quasi-Newton Methods [26.328847475942894]
We prove the non-asymptotic superlinear convergence rate of the Broyden class of quasi-Newton algorithms.
Our results are first to provide a non-asymptotic superlinear convergence rate for quasi-Newton methods.
arXiv Detail & Related papers (2020-03-30T16:42:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.