Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
- URL: http://arxiv.org/abs/2007.01990v1
- Date: Sat, 4 Jul 2020 02:52:11 GMT
- Title: Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
- Authors: Yi Chen, Jinglin Chen, Jing Dong, Jian Peng, Zhaoran Wang
- Abstract summary: Langevin diffusion is a powerful method for non- optimization.
We propose replica exchange, which swaps Langevin diffusions with different temperatures.
By discretizing the replica exchange Langevin diffusion, we obtain a discretetime algorithm.
- Score: 67.66101533752605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Langevin diffusion is a powerful method for nonconvex optimization, which
enables the escape from local minima by injecting noise into the gradient. In
particular, the temperature parameter controlling the noise level gives rise to
a tradeoff between ``global exploration'' and ``local exploitation'', which
correspond to high and low temperatures. To attain the advantages of both
regimes, we propose to use replica exchange, which swaps between two Langevin
diffusions with different temperatures. We theoretically analyze the
acceleration effect of replica exchange from two perspectives: (i) the
convergence in \chi^2-divergence, and (ii) the large deviation principle. Such
an acceleration effect allows us to faster approach the global minima.
Furthermore, by discretizing the replica exchange Langevin diffusion, we obtain
a discrete-time algorithm. For such an algorithm, we quantify its
discretization error in theory and demonstrate its acceleration effect in
practice.
Related papers
- Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.
We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost.
Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte
Carlo [4.656426393230839]
The rise of artificial intelligence (AI) hinges on efficient of modern deep neural networks (DNNs) for non-trips and uncertainty.
In this thesis we propose a tool to handle the problem of Monte Carlo exploitation.
We also propose two dynamic importance sampling algorithms for the underlying ordinary equation (ODE) system.
arXiv Detail & Related papers (2023-05-30T18:25:11Z) - Accelerating Convergence in Global Non-Convex Optimization with
Reversible Diffusion [0.0]
Langevin Dynamics has been extensively in global non- optimization experiments.
Our proposed method is used to investigate the trade-off between speed and discretization error.
arXiv Detail & Related papers (2023-05-19T07:49:40Z) - Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse
Problems [64.29491112653905]
We propose a novel and efficient diffusion sampling strategy that synergistically combines the diffusion sampling and Krylov subspace methods.
Specifically, we prove that if tangent space at a denoised sample by Tweedie's formula forms a Krylov subspace, then the CG with the denoised data ensures the data consistency update to remain in the tangent space.
Our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.
arXiv Detail & Related papers (2023-03-10T07:42:49Z) - Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC
via Variance Reduction [24.794221009364772]
We study the reduction for a noisy energy estimators variance, which promotes much more effective analysis.
We obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.
arXiv Detail & Related papers (2020-10-02T16:23:35Z) - Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed.
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.