Related papers: Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion

Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion

URL: http://arxiv.org/abs/2007.01990v1
Date: Sat, 4 Jul 2020 02:52:11 GMT
Title: Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
Authors: Yi Chen, Jinglin Chen, Jing Dong, Jian Peng, Zhaoran Wang
Abstract summary: Langevin diffusion is a powerful method for non- optimization. We propose replica exchange, which swaps Langevin diffusions with different temperatures. By discretizing the replica exchange Langevin diffusion, we obtain a discretetime algorithm.
Score: 67.66101533752605
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Langevin diffusion is a powerful method for nonconvex optimization, which enables the escape from local minima by injecting noise into the gradient. In particular, the temperature parameter controlling the noise level gives rise to a tradeoff between ``global exploration'' and ``local exploitation'', which correspond to high and low temperatures. To attain the advantages of both regimes, we propose to use replica exchange, which swaps between two Langevin diffusions with different temperatures. We theoretically analyze the acceleration effect of replica exchange from two perspectives: (i) the convergence in \chi^2-divergence, and (ii) the large deviation principle. Such an acceleration effect allows us to faster approach the global minima. Furthermore, by discretizing the replica exchange Langevin diffusion, we obtain a discrete-time algorithm. For such an algorithm, we quantify its discretization error in theory and demonstrate its acceleration effect in practice.

Related papers

Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility [62.272571285823595]
We show how immiscibility eases denoising and improves efficiency.<n>We propose a family of implementations including K-nearest neighbor (KNN) noise selection and image scaling to reduce miscibility.<n>This work establishes a potentially new direction for future research into high-efficiency diffusion training.
arXiv Detail & Related papers (2025-05-24T05:38:35Z)
Hierarchical Flow Diffusion for Efficient Frame Interpolation [7.471940227504413]
We propose to model bilateral optical flow explicitly by hierarchical diffusion models. We then use a flow-guided images synthesizer to produce the final result. Our method achieves state of the art in accuracy, and 10+ times faster than other diffusion-based methods.
arXiv Detail & Related papers (2025-04-01T02:50:00Z)
Semi-Implicit Functional Gradient Flow for Efficient Sampling [30.32233517392456]
We propose a functional gradient ParVI method that uses perturbed particles with Gaussian noise as the approximation family. We show that the corresponding functional gradient flow, which can be estimated via denoising score matching with neural networks, exhibits strong theoretical convergence guarantees. In addition, we present an adaptive version of our method that automatically selects the appropriate noise magnitude during sampling.
arXiv Detail & Related papers (2024-10-23T15:00:30Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost. Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z)
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z)
Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo [4.656426393230839]
The rise of artificial intelligence (AI) hinges on efficient of modern deep neural networks (DNNs) for non-trips and uncertainty. In this thesis we propose a tool to handle the problem of Monte Carlo exploitation. We also propose two dynamic importance sampling algorithms for the underlying ordinary equation (ODE) system.
arXiv Detail & Related papers (2023-05-30T18:25:11Z)
Accelerating Convergence in Global Non-Convex Optimization with Reversible Diffusion [0.0]
Langevin Dynamics has been extensively in global non- optimization experiments. Our proposed method is used to investigate the trade-off between speed and discretization error.
arXiv Detail & Related papers (2023-05-19T07:49:40Z)
Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse Problems [64.29491112653905]
We propose a novel and efficient diffusion sampling strategy that synergistically combines the diffusion sampling and Krylov subspace methods. Specifically, we prove that if tangent space at a denoised sample by Tweedie's formula forms a Krylov subspace, then the CG with the denoised data ensures the data consistency update to remain in the tangent space. Our proposed method achieves more than 80 times faster inference time than the previous state-of-the-art method.
arXiv Detail & Related papers (2023-03-10T07:42:49Z)
Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction [24.794221009364772]
We study the reduction for a noisy energy estimators variance, which promotes much more effective analysis. We obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.
arXiv Detail & Related papers (2020-10-02T16:23:35Z)
Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed. This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.