Related papers: CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing

CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing

URL: http://arxiv.org/abs/2005.14605v2
Date: Fri, 21 May 2021 15:26:37 GMT
Title: CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing
Authors: Oleksandr Borysenko and Maksym Byshkin
Abstract summary: Deep learning applications require global optimization of non- objective functions, which have multiple local minima. We show that a coordinate learning algorithm can be used to resolve the same problem in physical simulations.
Score: 23.87373187143897
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima. The same problem is often found in physical simulations and may be resolved by the methods of Langevin dynamics with Simulated Annealing, which is a well-established approach for minimization of many-particle potentials. This analogy provides useful insights for non-convex stochastic optimization in machine learning. Here we find that integration of the discretized Langevin equation gives a coordinate updating rule equivalent to the famous Momentum optimization algorithm. As a main result, we show that a gradual decrease of the momentum coefficient from the initial value close to unity until zero is equivalent to application of Simulated Annealing or slow cooling, in physical terms. Making use of this novel approach, we propose CoolMomentum -- a new stochastic optimization method. Applying Coolmomentum to optimization of Resnet-20 on Cifar-10 dataset and Efficientnet-B0 on Imagenet, we demonstrate that it is able to achieve high accuracies.

Related papers

Scalable Min-Max Optimization via Primal-Dual Exact Pareto Optimization [66.51747366239299]
We propose a smooth variant of the min-max problem based on the augmented Lagrangian. The proposed algorithm scales better with the number of objectives than subgradient-based strategies.
arXiv Detail & Related papers (2025-03-16T11:05:51Z)
Exponentially Better Bounds for Quantum Optimization via Dynamical Simulation [0.5097809301149342]
We provide several quantum algorithms for continuous optimization that do not require any gradient estimation. We encode the optimization problem into the dynamics of a physical system and coherently simulate the time evolution.
arXiv Detail & Related papers (2025-02-06T18:32:26Z)
Accelerate Neural Subspace-Based Reduced-Order Solver of Deformable Simulation by Lipschitz Optimization [9.364019847856714]
Reduced-order simulation is an emerging method for accelerating physical simulations with high DOFs. We propose a method for finding optimized subspace mappings, enabling further acceleration of neural reduced-order simulations. We demonstrate the effectiveness of our approach through general cases in both quasi-static and dynamics simulations.
arXiv Detail & Related papers (2024-09-05T12:56:03Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent. The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
An Adaptive Gradient Method with Energy and Momentum [0.0]
We introduce a novel algorithm for gradient-based optimization of objective functions. The method is simple to implement, computationally efficient, and well suited for large-scale machine learning problems.
arXiv Detail & Related papers (2022-03-23T04:48:38Z)
Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems. Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points. The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding. In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
Consensus-Based Optimization on the Sphere: Convergence to Global Minimizers and Machine Learning [7.998311072988401]
We investigate the implementation of a new Kuramoto-Vicsek-type model for global optimization of non functions on the sphere. We present several numerical experiments, which show that the algorithm proposed in the present paper scales well with the dimension and is extremely versatile.
arXiv Detail & Related papers (2020-01-31T18:23:26Z)
A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs) We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation. Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.