Krylov Cubic Regularized Newton: A Subspace Second-Order Method with
Dimension-Free Convergence Rate
- URL: http://arxiv.org/abs/2401.03058v1
- Date: Fri, 5 Jan 2024 20:24:18 GMT
- Title: Krylov Cubic Regularized Newton: A Subspace Second-Order Method with
Dimension-Free Convergence Rate
- Authors: Ruichen Jiang, Parameswaran Raman, Shoham Sabach, Aryan Mokhtari,
Mingyi Hong, Volkan Cevher
- Abstract summary: We introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of $Oleft(frac1mk+frac1k2right)$.
Our method converges faster than existing random subspace methods, especially for high-dimensional problems.
- Score: 83.3933097134767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Second-order optimization methods, such as cubic regularized Newton methods,
are known for their rapid convergence rates; nevertheless, they become
impractical in high-dimensional problems due to their substantial memory
requirements and computational costs. One promising approach is to execute
second-order updates within a lower-dimensional subspace, giving rise to
subspace second-order methods. However, the majority of existing subspace
second-order methods randomly select subspaces, consequently resulting in
slower convergence rates depending on the problem's dimension $d$. In this
paper, we introduce a novel subspace cubic regularized Newton method that
achieves a dimension-independent global convergence rate of
${O}\left(\frac{1}{mk}+\frac{1}{k^2}\right)$ for solving convex optimization
problems. Here, $m$ represents the subspace dimension, which can be
significantly smaller than $d$. Instead of adopting a random subspace, our
primary innovation involves performing the cubic regularized Newton update
within the Krylov subspace associated with the Hessian and the gradient of the
objective function. This result marks the first instance of a
dimension-independent convergence rate for a subspace second-order method.
Furthermore, when specific spectral conditions of the Hessian are met, our
method recovers the convergence rate of a full-dimensional cubic regularized
Newton method. Numerical experiments show our method converges faster than
existing random subspace methods, especially for high-dimensional problems.
Related papers
- Improving Stochastic Cubic Newton with Momentum [37.1630298053787]
We show that momentum provably stabilizes the variance of estimates.
Using our globalization technique, we prove a convergence point.
We also show convex Newtonized methods with momentum.
arXiv Detail & Related papers (2024-10-25T15:49:16Z) - Cubic regularized subspace Newton for non-convex optimization [3.481985817302898]
We analyze a coordinate second-order SSCN which can be interpreted as applying stationary regularization in random subspaces.
We demonstrate substantial speed-ups compared to conventional first-order methods.
arXiv Detail & Related papers (2024-06-24T14:20:02Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth
Convex Optimization [26.328847475942894]
We prove that our method can achieve a convergence rate of $Obigl(minfrac1k2, fracsqrtdlog kk2.5bigr)$, where $d$ is the problem dimension and $k$ is the number of iterations.
To the best of our knowledge, this result is the first to demonstrate a provable gain of a quasi-Newton-type method over Nesterov's accelerated gradient.
arXiv Detail & Related papers (2023-06-03T23:31:27Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - Second-order optimization with lazy Hessians [55.51077907483634]
We analyze Newton's lazy Hessian updates for solving general possibly non-linear optimization problems.
We reuse a previously seen Hessian iteration while computing new gradients at each step of the method.
arXiv Detail & Related papers (2022-12-01T18:58:26Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - DRSOM: A Dimension Reduced Second-Order Method [13.778619250890406]
Under a trust-like framework, our method preserves the convergence of the second-order method while using only information in a few directions.
Theoretically, we show that the method has a local convergence and a global convergence rate of $O(epsilon-3/2)$ to satisfy the first-order and second-order conditions.
arXiv Detail & Related papers (2022-07-30T13:05:01Z) - Stochastic Subspace Cubic Newton Method [14.624340432672172]
We propose a new randomized second-order optimization algorithm for minimizing a high dimensional convex function $f$.
We prove that as we vary the minibatch size, the global convergence rate of SSCN interpolates between the rate of coordinate descent (CD) and the rate of cubic regularized Newton.
Remarkably, the local convergence rate of SSCN matches the rate of subspace descent applied to the problem of minimizing the quadratic function $frac12 (x-x*)top nabla2f(x*)(x-x*)$, where $x
arXiv Detail & Related papers (2020-02-21T19:42:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.