Related papers: Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update

Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update

URL: http://arxiv.org/abs/2107.07480v1
Date: Thu, 15 Jul 2021 17:33:05 GMT
Title: Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update
Authors: Micha{\l} Derezi\'nski, Jonathan Lacotte, Mert Pilanci and Michael W. Mahoney
Abstract summary: In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
Score: 88.73437209862891
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence properties. This approach, called Newton-LESS, is based on a recently introduced sketching technique: LEverage Score Sparsified (LESS) embeddings. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings, not just up to constant factors but even down to lower order terms, for a large class of optimization tasks. In particular, this leads to a new state-of-the-art convergence result for an iterative least squares solver. Finally, we extend LESS embeddings to include uniformly sparsified random sign matrices which can be implemented efficiently and which perform well in numerical experiments.

Related papers

Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing [45.475515050909706]
We consider the problem of minimizing a convex function in a massively parallel fashion, where communication between workers is limited. We propose a scheme where the central node (server) effectively runs a Newton method, offloading its high per-iteration cost. In our solution, workers produce independently coarse but low-bias estimates of the inverse Hessian, using an adaptive sketching scheme.
arXiv Detail & Related papers (2024-10-02T09:38:04Z)
Alternating Iteratively Reweighted $\ell_1$ and Subspace Newton Algorithms for Nonconvex Sparse Optimization [11.56128809794923]
We present a novel hybrid algorithm for minimizing the sum of a differentiable loss function and a nonsmooth regularization function. We prove global convergence to a critical point and, under suitable conditions, demonstrate that the algorithm outperforms existing methods.
arXiv Detail & Related papers (2024-07-24T12:15:59Z)
Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks [1.3654846342364308]
We show that a first-scalable optimisation algorithm can efficiently use the exact inverse Hessian with absolute-value eigenvalues. A t-run of this series provides a new optimisation which is comparable to other first- and second-order optimisation methods.
arXiv Detail & Related papers (2023-10-23T13:11:30Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Self-concordant Smoothing for Large-Scale Convex Composite Optimization [0.0]
We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other may be nonsmooth. We prove the convergence of two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton algorithm.
arXiv Detail & Related papers (2023-09-04T19:47:04Z)
Projection-Free Online Convex Optimization via Efficient Newton Iterations [10.492474737007722]
This paper presents new projection-free algorithms for Online Convex Optimization (OCO) over a convex domain $mathcalK subset mathbbRd$.
arXiv Detail & Related papers (2023-06-19T18:48:53Z)
Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems. We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z)
Efficient Graph Laplacian Estimation by Proximal Newton [12.05527862797306]
A graph learning problem can be formulated as a maximum likelihood estimation (MLE) of the precision matrix. We develop a second-order approach to obtain an efficient solver utilizing several algorithmic features.
arXiv Detail & Related papers (2023-02-13T15:13:22Z)
Second-order optimization with lazy Hessians [55.51077907483634]
We analyze Newton's lazy Hessian updates for solving general possibly non-linear optimization problems. We reuse a previously seen Hessian iteration while computing new gradients at each step of the method.
arXiv Detail & Related papers (2022-12-01T18:58:26Z)
Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems. We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.