Stochastic Subspace Cubic Newton Method
- URL: http://arxiv.org/abs/2002.09526v1
- Date: Fri, 21 Feb 2020 19:42:18 GMT
- Title: Stochastic Subspace Cubic Newton Method
- Authors: Filip Hanzely, Nikita Doikov, Peter Richt\'arik, Yurii Nesterov
- Abstract summary: We propose a new randomized second-order optimization algorithm for minimizing a high dimensional convex function $f$.
We prove that as we vary the minibatch size, the global convergence rate of SSCN interpolates between the rate of coordinate descent (CD) and the rate of cubic regularized Newton.
Remarkably, the local convergence rate of SSCN matches the rate of subspace descent applied to the problem of minimizing the quadratic function $frac12 (x-x*)top nabla2f(x*)(x-x*)$, where $x
- Score: 14.624340432672172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a new randomized second-order optimization
algorithm---Stochastic Subspace Cubic Newton (SSCN)---for minimizing a high
dimensional convex function $f$. Our method can be seen both as a {\em
stochastic} extension of the cubically-regularized Newton method of Nesterov
and Polyak (2006), and a {\em second-order} enhancement of stochastic subspace
descent of Kozak et al. (2019). We prove that as we vary the minibatch size,
the global convergence rate of SSCN interpolates between the rate of stochastic
coordinate descent (CD) and the rate of cubic regularized Newton, thus giving
new insights into the connection between first and second-order methods.
Remarkably, the local convergence rate of SSCN matches the rate of stochastic
subspace descent applied to the problem of minimizing the quadratic function
$\frac12 (x-x^*)^\top \nabla^2f(x^*)(x-x^*)$, where $x^*$ is the minimizer of
$f$, and hence depends on the properties of $f$ at the optimum only. Our
numerical experiments show that SSCN outperforms non-accelerated first-order CD
algorithms while being competitive to their accelerated variants.
Related papers
- Improving Stochastic Cubic Newton with Momentum [37.1630298053787]
We show that momentum provably stabilizes the variance of estimates.
Using our globalization technique, we prove a convergence point.
We also show convex Newtonized methods with momentum.
arXiv Detail & Related papers (2024-10-25T15:49:16Z) - Cubic regularized subspace Newton for non-convex optimization [3.481985817302898]
We analyze a coordinate second-order SSCN which can be interpreted as applying stationary regularization in random subspaces.
We demonstrate substantial speed-ups compared to conventional first-order methods.
arXiv Detail & Related papers (2024-06-24T14:20:02Z) - Krylov Cubic Regularized Newton: A Subspace Second-Order Method with
Dimension-Free Convergence Rate [83.3933097134767]
We introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of $Oleft(frac1mk+frac1k2right)$.
Our method converges faster than existing random subspace methods, especially for high-dimensional problems.
arXiv Detail & Related papers (2024-01-05T20:24:18Z) - Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth
Convex Optimization [26.328847475942894]
We prove that our method can achieve a convergence rate of $Obigl(minfrac1k2, fracsqrtdlog kk2.5bigr)$, where $d$ is the problem dimension and $k$ is the number of iterations.
To the best of our knowledge, this result is the first to demonstrate a provable gain of a quasi-Newton-type method over Nesterov's accelerated gradient.
arXiv Detail & Related papers (2023-06-03T23:31:27Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Nys-Curve: Nystr\"om-Approximated Curvature for Stochastic Optimization [20.189732632410024]
The quasi-Newton method provides curvature information by approximating the Hessian using the secant equation.
We propose an approximate Newton step-based DP optimization algorithm for large-scale empirical risk of convex functions with linear convergence rates.
arXiv Detail & Related papers (2021-10-16T14:04:51Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.