CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics
with Simulated Annealing
- URL: http://arxiv.org/abs/2005.14605v2
- Date: Fri, 21 May 2021 15:26:37 GMT
- Title: CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics
with Simulated Annealing
- Authors: Oleksandr Borysenko and Maksym Byshkin
- Abstract summary: Deep learning applications require global optimization of non- objective functions, which have multiple local minima.
We show that a coordinate learning algorithm can be used to resolve the same problem in physical simulations.
- Score: 23.87373187143897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning applications require global optimization of non-convex
objective functions, which have multiple local minima. The same problem is
often found in physical simulations and may be resolved by the methods of
Langevin dynamics with Simulated Annealing, which is a well-established
approach for minimization of many-particle potentials. This analogy provides
useful insights for non-convex stochastic optimization in machine learning.
Here we find that integration of the discretized Langevin equation gives a
coordinate updating rule equivalent to the famous Momentum optimization
algorithm. As a main result, we show that a gradual decrease of the momentum
coefficient from the initial value close to unity until zero is equivalent to
application of Simulated Annealing or slow cooling, in physical terms. Making
use of this novel approach, we propose CoolMomentum -- a new stochastic
optimization method. Applying Coolmomentum to optimization of Resnet-20 on
Cifar-10 dataset and Efficientnet-B0 on Imagenet, we demonstrate that it is
able to achieve high accuracies.
Related papers
- Accelerate Neural Subspace-Based Reduced-Order Solver of Deformable Simulation by Lipschitz Optimization [9.364019847856714]
Reduced-order simulation is an emerging method for accelerating physical simulations with high DOFs.
We propose a method for finding optimized subspace mappings, enabling further acceleration of neural reduced-order simulations.
We demonstrate the effectiveness of our approach through general cases in both quasi-static and dynamics simulations.
arXiv Detail & Related papers (2024-09-05T12:56:03Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent.
The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - An Adaptive Gradient Method with Energy and Momentum [0.0]
We introduce a novel algorithm for gradient-based optimization of objective functions.
The method is simple to implement, computationally efficient, and well suited for large-scale machine learning problems.
arXiv Detail & Related papers (2022-03-23T04:48:38Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z) - Consensus-Based Optimization on the Sphere: Convergence to Global
Minimizers and Machine Learning [7.998311072988401]
We investigate the implementation of a new Kuramoto-Vicsek-type model for global optimization of non functions on the sphere.
We present several numerical experiments, which show that the algorithm proposed in the present paper scales well with the dimension and is extremely versatile.
arXiv Detail & Related papers (2020-01-31T18:23:26Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.