Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update
- URL: http://arxiv.org/abs/2107.07480v1
- Date: Thu, 15 Jul 2021 17:33:05 GMT
- Title: Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update
- Authors: Micha{\l} Derezi\'nski, Jonathan Lacotte, Mert Pilanci and Michael W.
Mahoney
- Abstract summary: In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
- Score: 88.73437209862891
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In second-order optimization, a potential bottleneck can be computing the
Hessian matrix of the optimized function at every iteration. Randomized
sketching has emerged as a powerful technique for constructing estimates of the
Hessian which can be used to perform approximate Newton steps. This involves
multiplication by a random sketching matrix, which introduces a trade-off
between the computational cost of sketching and the convergence rate of the
optimization algorithm. A theoretically desirable but practically much too
expensive choice is to use a dense Gaussian sketching matrix, which produces
unbiased estimates of the exact Newton step and which offers strong
problem-independent convergence guarantees. We show that the Gaussian sketching
matrix can be drastically sparsified, significantly reducing the computational
cost of sketching, without substantially affecting its convergence properties.
This approach, called Newton-LESS, is based on a recently introduced sketching
technique: LEverage Score Sparsified (LESS) embeddings. We prove that
Newton-LESS enjoys nearly the same problem-independent local convergence rate
as Gaussian embeddings, not just up to constant factors but even down to lower
order terms, for a large class of optimization tasks. In particular, this leads
to a new state-of-the-art convergence result for an iterative least squares
solver. Finally, we extend LESS embeddings to include uniformly sparsified
random sign matrices which can be implemented efficiently and which perform
well in numerical experiments.
Related papers
- Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing [45.475515050909706]
We consider the problem of minimizing a convex function in a massively parallel fashion, where communication between workers is limited.
We propose a scheme where the central node (server) effectively runs a Newton method, offloading its high per-iteration cost.
In our solution, workers produce independently coarse but low-bias estimates of the inverse Hessian, using an adaptive sketching scheme.
arXiv Detail & Related papers (2024-10-02T09:38:04Z) - Series of Hessian-Vector Products for Tractable Saddle-Free Newton
Optimisation of Neural Networks [1.3654846342364308]
We show that a first-scalable optimisation algorithm can efficiently use the exact inverse Hessian with absolute-value eigenvalues.
A t-run of this series provides a new optimisation which is comparable to other first- and second-order optimisation methods.
arXiv Detail & Related papers (2023-10-23T13:11:30Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Self-concordant Smoothing for Large-Scale Convex Composite Optimization [0.0]
We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other may be nonsmooth.
We prove the convergence of two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton algorithm.
arXiv Detail & Related papers (2023-09-04T19:47:04Z) - Projection-Free Online Convex Optimization via Efficient Newton
Iterations [10.492474737007722]
This paper presents new projection-free algorithms for Online Convex Optimization (OCO) over a convex domain $mathcalK subset mathbbRd$.
arXiv Detail & Related papers (2023-06-19T18:48:53Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Efficient Graph Laplacian Estimation by Proximal Newton [12.05527862797306]
A graph learning problem can be formulated as a maximum likelihood estimation (MLE) of the precision matrix.
We develop a second-order approach to obtain an efficient solver utilizing several algorithmic features.
arXiv Detail & Related papers (2023-02-13T15:13:22Z) - Second-order optimization with lazy Hessians [55.51077907483634]
We analyze Newton's lazy Hessian updates for solving general possibly non-linear optimization problems.
We reuse a previously seen Hessian iteration while computing new gradients at each step of the method.
arXiv Detail & Related papers (2022-12-01T18:58:26Z) - Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems.
We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.