Related papers: Stochastic Subspace Cubic Newton Method

Stochastic Subspace Cubic Newton Method

URL: http://arxiv.org/abs/2002.09526v1
Date: Fri, 21 Feb 2020 19:42:18 GMT
Title: Stochastic Subspace Cubic Newton Method
Authors: Filip Hanzely, Nikita Doikov, Peter Richt\'arik, Yurii Nesterov
Abstract summary: We propose a new randomized second-order optimization algorithm for minimizing a high dimensional convex function $f$. We prove that as we vary the minibatch size, the global convergence rate of SSCN interpolates between the rate of coordinate descent (CD) and the rate of cubic regularized Newton. Remarkably, the local convergence rate of SSCN matches the rate of subspace descent applied to the problem of minimizing the quadratic function $frac12 (x-x*)top nabla2f(x*)(x-x*)$, where $x
Score: 14.624340432672172
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a new randomized second-order optimization algorithm---Stochastic Subspace Cubic Newton (SSCN)---for minimizing a high dimensional convex function $f$. Our method can be seen both as a {\em stochastic} extension of the cubically-regularized Newton method of Nesterov and Polyak (2006), and a {\em second-order} enhancement of stochastic subspace descent of Kozak et al. (2019). We prove that as we vary the minibatch size, the global convergence rate of SSCN interpolates between the rate of stochastic coordinate descent (CD) and the rate of cubic regularized Newton, thus giving new insights into the connection between first and second-order methods. Remarkably, the local convergence rate of SSCN matches the rate of stochastic subspace descent applied to the problem of minimizing the quadratic function $\frac12 (x-x^*)^\top \nabla^2f(x^*)(x-x^*)$, where $x^*$ is the minimizer of $f$, and hence depends on the properties of $f$ at the optimum only. Our numerical experiments show that SSCN outperforms non-accelerated first-order CD algorithms while being competitive to their accelerated variants.

Related papers

Improving Stochastic Cubic Newton with Momentum [37.1630298053787]
We show that momentum provably stabilizes the variance of estimates. Using our globalization technique, we prove a convergence point. We also show convex Newtonized methods with momentum.
arXiv Detail & Related papers (2024-10-25T15:49:16Z)
Cubic regularized subspace Newton for non-convex optimization [3.481985817302898]
We analyze a coordinate second-order SSCN which can be interpreted as applying stationary regularization in random subspaces. We demonstrate substantial speed-ups compared to conventional first-order methods.
arXiv Detail & Related papers (2024-06-24T14:20:02Z)
Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate [83.3933097134767]
We introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of $Oleft(frac1mk+frac1k2right)$. Our method converges faster than existing random subspace methods, especially for high-dimensional problems.
arXiv Detail & Related papers (2024-01-05T20:24:18Z)
Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization [26.328847475942894]
We prove that our method can achieve a convergence rate of $Obigl(minfrac1k2, fracsqrtdlog kk2.5bigr)$, where $d$ is the problem dimension and $k$ is the number of iterations. To the best of our knowledge, this result is the first to demonstrate a provable gain of a quasi-Newton-type method over Nesterov's accelerated gradient.
arXiv Detail & Related papers (2023-06-03T23:31:27Z)
Second-order optimization with lazy Hessians [55.51077907483634]
We analyze Newton's lazy Hessian updates for solving general possibly non-linear optimization problems. We reuse a previously seen Hessian iteration while computing new gradients at each step of the method.
arXiv Detail & Related papers (2022-12-01T18:58:26Z)
Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems. We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$. We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z)
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method. We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z)
Nys-Curve: Nystr\"om-Approximated Curvature for Stochastic Optimization [20.189732632410024]
The quasi-Newton method provides curvature information by approximating the Hessian using the secant equation. We propose an approximate Newton step-based DP optimization algorithm for large-scale empirical risk of convex functions with linear convergence rates.
arXiv Detail & Related papers (2021-10-16T14:04:51Z)
Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems. MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
Escaping Saddle-Points Faster under Interpolation-like Conditions [19.9471360853892]
We show that under over-parametrization several standard optimization algorithms escape saddle-points and converge to local-minimizers much faster. We discuss the first-order oracle complexity of Perturbed Gradient Descent (PSGD) algorithm to reach an $epsilon$ localminimizer. We next analyze Cubic-Regularized Newton (SCRN) algorithm under-like conditions, and show that the oracle complexity to reach an $epsilon$ local-minimizer under-like conditions, is $tildemathcalO (1/epsilon2.5
arXiv Detail & Related papers (2020-09-28T02:15:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.