Series of Hessian-Vector Products for Tractable Saddle-Free Newton
Optimisation of Neural Networks
- URL: http://arxiv.org/abs/2310.14901v2
- Date: Tue, 27 Feb 2024 14:13:44 GMT
- Title: Series of Hessian-Vector Products for Tractable Saddle-Free Newton
Optimisation of Neural Networks
- Authors: Elre T. Oldewage, Ross M. Clarke and Jos\'e Miguel Hern\'andez-Lobato
- Abstract summary: We show that a first-scalable optimisation algorithm can efficiently use the exact inverse Hessian with absolute-value eigenvalues.
A t-run of this series provides a new optimisation which is comparable to other first- and second-order optimisation methods.
- Score: 1.3654846342364308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their popularity in the field of continuous optimisation,
second-order quasi-Newton methods are challenging to apply in machine learning,
as the Hessian matrix is intractably large. This computational burden is
exacerbated by the need to address non-convexity, for instance by modifying the
Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an
optimisation algorithm which addresses both of these concerns - to our
knowledge, the first efficiently-scalable optimisation algorithm to
asymptotically use the exact inverse Hessian with absolute-value eigenvalues.
Our method frames the problem as a series which principally square-roots and
inverts the squared Hessian, then uses it to precondition a gradient vector,
all without explicitly computing or eigendecomposing the Hessian. A truncation
of this infinite series provides a new optimisation algorithm which is scalable
and comparable to other first- and second-order optimisation methods in both
runtime and optimisation performance. We demonstrate this in a variety of
settings, including a ResNet-18 trained on CIFAR-10.
Related papers
- Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Self-concordant Smoothing for Large-Scale Convex Composite Optimization [0.0]
We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other may be nonsmooth.
We prove the convergence of two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton algorithm.
arXiv Detail & Related papers (2023-09-04T19:47:04Z) - Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - SHINE: SHaring the INverse Estimate from the forward pass for bi-level
optimization and implicit models [15.541264326378366]
In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks.
The training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix.
We propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer.
arXiv Detail & Related papers (2021-06-01T15:07:34Z) - Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for
Nonconvex Stochastic Optimization [17.219297142656828]
We introduce a quasi-Newton method for non-github optimization, which dynamically incorporates the curvature of the loss by the Hessian.
The implementation of the algorithm is available at https://www.xuezmax.com/XuezMax/apollo.
arXiv Detail & Related papers (2020-09-28T19:07:02Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.