A modified limited memory Nesterov's accelerated quasi-Newton
- URL: http://arxiv.org/abs/2112.01327v1
- Date: Wed, 1 Dec 2021 06:48:47 GMT
- Title: A modified limited memory Nesterov's accelerated quasi-Newton
- Authors: S. Indrapriyadarsini, Shahrzad Mahboubi, Hiroshi Ninomiya, Takeshi
Kamio, Hideki Asai
- Abstract summary: The Nesterov's accelerated quasi-Newton (L)NAQ method has shown to accelerate the conventional (L)BFGS quasi-Newton method.
The Momentum accelerated Quasi-Newton (MoQ) method showed that the Nesterov's accelerated gradient can be approximated as a linear combination of past gradients.
This abstract extends the MoQ approximation to limited memory NAQ and evaluates the performance on a function approximation problem.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Nesterov's accelerated quasi-Newton (L)NAQ method has shown to accelerate
the conventional (L)BFGS quasi-Newton method using the Nesterov's accelerated
gradient in several neural network (NN) applications. However, the calculation
of two gradients per iteration increases the computational cost. The Momentum
accelerated Quasi-Newton (MoQ) method showed that the Nesterov's accelerated
gradient can be approximated as a linear combination of past gradients. This
abstract extends the MoQ approximation to limited memory NAQ and evaluates the
performance on a function approximation problem.
Related papers
- Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Accelerated gradient methods for nonconvex optimization: Escape
trajectories from strict saddle points and convergence to local minima [9.66353475649122]
This paper considers the problem.
of understanding convex behavior of a general general of accelerated gradient methods on.
non-aptotic functions.
It shows that Nesterov's accelerated method (NAG) with variable momentum avoids strict saddle points almost surely.
arXiv Detail & Related papers (2023-07-13T19:11:07Z) - Nesterov acceleration despite very noisy gradients [2.048226951354646]
We present a generalization of Nesterov's accelerated gradient descent algorithm.
AGNES achieves acceleration for smooth convex and strongly convex minimization tasks.
arXiv Detail & Related papers (2023-02-10T21:32:47Z) - Revisiting the acceleration phenomenon via high-resolution differential
equations [6.53306151979817]
Nesterov's accelerated gradient descent (NAG) is one of the milestones in the history of first-order algorithms.
We investigate NAG for the $mu$-strongly convex function based on the techniques of Lyapunov analysis and phase-space representation.
We find that the high-resolution differential equation framework from the implicit-velocity scheme of NAG is perfect and outperforms the gradient-correction scheme.
arXiv Detail & Related papers (2022-12-12T04:36:37Z) - Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep
Models [158.19276683455254]
Adaptive gradient algorithms borrow the moving average idea of heavy ball acceleration to estimate accurate first second-order moments of gradient for accelerating convergence.
Nesterov acceleration converges faster than ball acceleration in theory and also in many empirical cases.
In this paper we develop a new Nesterov momentum estimation (NME) method, which avoids the extra computation and memory overhead of computing gradient at the point.
We show that Adan surpasses the corresponding SoTAs on both vision transformers (ViTs and CNNs) and sets new SoTAs for many popular networks.
arXiv Detail & Related papers (2022-08-13T16:04:39Z) - Fast Margin Maximization via Dual Acceleration [52.62944011696364]
We present and analyze a momentum-based method for training linear classifiers with an exponentially-tailed loss.
This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual.
arXiv Detail & Related papers (2021-07-01T16:36:39Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z) - Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed.
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z) - Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models.
We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.