Just a Momentum: Analytical Study of Momentum-Based Acceleration Methods
in Paradigmatic High-Dimensional Non-Convex Problem
- URL: http://arxiv.org/abs/2102.11755v2
- Date: Wed, 24 Feb 2021 09:40:16 GMT
- Title: Just a Momentum: Analytical Study of Momentum-Based Acceleration Methods
in Paradigmatic High-Dimensional Non-Convex Problem
- Authors: Stefano Sarao Mannelli and Pierfrancesco Urbani
- Abstract summary: When over loss functions it is common practice to use momentum-based methods rather than vanilla gradient-based loss method.
We show how having a mass increases the effective step ball dynamics dynamics leading to up.
- Score: 12.132641563193584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When optimizing over loss functions it is common practice to use
momentum-based accelerated methods rather than vanilla gradient-based method.
Despite widely applied to arbitrary loss function, their behaviour in
generically non-convex, high dimensional landscapes is poorly understood. In
this work we used dynamical mean field theory techniques to describe
analytically the average behaviour of these methods in a prototypical
non-convex model: the (spiked) matrix-tensor model. We derive a closed set of
equations that describe the behaviours of several algorithms including
heavy-ball momentum and Nesterov acceleration. Additionally we characterize the
evolution of a mathematically equivalent physical system of massive particles
relaxing toward the bottom of an energetic landscape. Under the correct mapping
the two dynamics are equivalent and it can be noticed that having a large mass
increases the effective time step of the heavy ball dynamics leading to a speed
up.
Related papers
- An optimization-based equilibrium measure describes non-equilibrium steady state dynamics: application to edge of chaos [2.5690340428649328]
Understanding neural dynamics is a central topic in machine learning, non-linear physics and neuroscience.
The dynamics is non-linear, and particularly non-gradient, i.e., the driving force can not be written as gradient of a potential.
arXiv Detail & Related papers (2024-01-18T14:25:32Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Rigorous dynamical mean field theory for stochastic gradient descent
methods [17.90683687731009]
We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods.
This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
arXiv Detail & Related papers (2022-10-12T21:10:55Z) - A scalable deep learning approach for solving high-dimensional dynamic
optimal transport [18.67654056717166]
We propose a deep learning based method to solve the dynamic optimal transport in high dimensional space.
Our method contains three main ingredients: a carefully designed representation of the velocity field, the discretization of the PDE constraint along the characteristics, and the computation of high dimensional integral by Monte Carlo method in each time step.
arXiv Detail & Related papers (2022-05-16T08:56:05Z) - Extension of Dynamic Mode Decomposition for dynamic systems with
incomplete information based on t-model of optimal prediction [69.81996031777717]
The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data.
The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missing or unmeasured.
We consider a first-order approximation of the Mori-Zwanzig decomposition, state the corresponding optimization problem and solve it with the gradient-based optimization method.
arXiv Detail & Related papers (2022-02-23T11:23:59Z) - Decimation technique for open quantum systems: a case study with
driven-dissipative bosonic chains [62.997667081978825]
Unavoidable coupling of quantum systems to external degrees of freedom leads to dissipative (non-unitary) dynamics.
We introduce a method to deal with these systems based on the calculation of (dissipative) lattice Green's function.
We illustrate the power of this method with several examples of driven-dissipative bosonic chains of increasing complexity.
arXiv Detail & Related papers (2022-02-15T19:00:09Z) - Momentum Doesn't Change the Implicit Bias [36.301490759243876]
We analyze the implicit bias of momentum-based optimization.
We construct new Lyapunov functions as a tool to analyze the gap between the model parameter and the max-margin solution.
arXiv Detail & Related papers (2021-10-08T04:37:18Z) - A Discrete Variational Derivation of Accelerated Methods in Optimization [68.8204255655161]
We introduce variational which allow us to derive different methods for optimization.
We derive two families of optimization methods in one-to-one correspondence.
The preservation of symplecticity of autonomous systems occurs here solely on the fibers.
arXiv Detail & Related papers (2021-06-04T20:21:53Z) - Fast Gravitational Approach for Rigid Point Set Registration with
Ordinary Differential Equations [79.71184760864507]
This article introduces a new physics-based method for rigid point set alignment called Fast Gravitational Approach (FGA)
In FGA, the source and target point sets are interpreted as rigid particle swarms with masses interacting in a globally multiply-linked manner while moving in a simulated gravitational force field.
We show that the new method class has characteristics not found in previous alignment methods.
arXiv Detail & Related papers (2020-09-28T15:05:39Z) - Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed.
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z) - A New Accelerated Stochastic Gradient Method with Momentum [4.967897656554012]
gradient descent with momentum (Sgdm) use weights that decay exponentially with the iteration times to generate an momentum term.
We provide theoretical convergence properties analyses for our method, which show both the exponentially decay weights and our inverse proportionally decay weights can limit the variance of the moving direction of parameters to be optimized to a region.
arXiv Detail & Related papers (2020-05-31T03:04:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.