Improving Stochastic Cubic Newton with Momentum
- URL: http://arxiv.org/abs/2410.19644v1
- Date: Fri, 25 Oct 2024 15:49:16 GMT
- Title: Improving Stochastic Cubic Newton with Momentum
- Authors: El Mahdi Chayti, Nikita Doikov, Martin Jaggi,
- Abstract summary: We show that momentum provably stabilizes the variance of estimates.
Using our globalization technique, we prove a convergence point.
We also show convex Newtonized methods with momentum.
- Score: 37.1630298053787
- License:
- Abstract: We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton's method. We show that momentum provably improves the variance of stochastic estimates and allows the method to converge for any noise level. Using the cubic regularization technique, we prove a global convergence rate for our method on general non-convex problems to a second-order stationary point, even when using only a single stochastic data sample per iteration. This starkly contrasts with all existing stochastic second-order methods for non-convex problems, which typically require large batches. Therefore, we are the first to demonstrate global convergence for batches of arbitrary size in the non-convex case for the Stochastic Cubic Newton. Additionally, we show improved speed on convex stochastic problems for our regularized Newton methods with momentum.
Related papers
- Fast Unconstrained Optimization via Hessian Averaging and Adaptive Gradient Sampling Methods [0.3222802562733786]
We consider minimizing finite-sum expectation objective functions via Hessian-averaging based subsampled Newton methods.
These methods allow for inexactness and have fixed per-it Hessian approximation costs.
We present novel analysis techniques and propose challenges for their practical implementation.
arXiv Detail & Related papers (2024-08-14T03:27:48Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods [37.1630298053787]
We propose a new framework, which we call the helper framework.
It provides a unified view of the variance and second-order algorithms equipped with global complexity guarantees.
arXiv Detail & Related papers (2023-02-23T12:18:28Z) - Second-order optimization with lazy Hessians [55.51077907483634]
We analyze Newton's lazy Hessian updates for solving general possibly non-linear optimization problems.
We reuse a previously seen Hessian iteration while computing new gradients at each step of the method.
arXiv Detail & Related papers (2022-12-01T18:58:26Z) - Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization
with Large Batches [22.925108493465363]
We propose a finite-sum minimization algorithm which enjoys all the benefits of second-order methods.
We show that SVRN can accelerate many second-order methods (such as Subsampled Newton) as well as iterative least squares solvers.
arXiv Detail & Related papers (2022-06-06T16:00:18Z) - Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems.
We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Stochastic Subspace Cubic Newton Method [14.624340432672172]
We propose a new randomized second-order optimization algorithm for minimizing a high dimensional convex function $f$.
We prove that as we vary the minibatch size, the global convergence rate of SSCN interpolates between the rate of coordinate descent (CD) and the rate of cubic regularized Newton.
Remarkably, the local convergence rate of SSCN matches the rate of subspace descent applied to the problem of minimizing the quadratic function $frac12 (x-x*)top nabla2f(x*)(x-x*)$, where $x
arXiv Detail & Related papers (2020-02-21T19:42:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.