Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC
via Variance Reduction
- URL: http://arxiv.org/abs/2010.01084v2
- Date: Thu, 18 Mar 2021 16:24:47 GMT
- Title: Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC
via Variance Reduction
- Authors: Wei Deng and Qi Feng and Georgios Karagiannis and Guang Lin and Faming
Liang
- Abstract summary: We study the reduction for a noisy energy estimators variance, which promotes much more effective analysis.
We obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.
- Score: 24.794221009364772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Replica exchange stochastic gradient Langevin dynamics (reSGLD) has shown
promise in accelerating the convergence in non-convex learning; however, an
excessively large correction for avoiding biases from noisy energy estimators
has limited the potential of the acceleration. To address this issue, we study
the variance reduction for noisy energy estimators, which promotes much more
effective swaps. Theoretically, we provide a non-asymptotic analysis on the
exponential acceleration for the underlying continuous-time Markov jump
process; moreover, we consider a generalized Girsanov theorem which includes
the change of Poisson measure to overcome the crude discretization based on the
Gr\"{o}wall's inequality and yields a much tighter error in the 2-Wasserstein
($\mathcal{W}_2$) distance. Numerically, we conduct extensive experiments and
obtain the state-of-the-art results in optimization and uncertainty estimates
for synthetic experiments and image data.
Related papers
- Accelerated forward-backward and Douglas-Rachford splitting dynamics [0.0]
We study convergence properties of continuous-time variants of accelerated Forward-Backward (FB) and Douglas-Rachford (DR) splitting algorithms.
For FB splitting dynamics, we demonstrate that accelerated exponential convergence rate carries over to general strongly convex problems.
arXiv Detail & Related papers (2024-07-30T07:52:22Z) - Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions.
Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z) - Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise [16.12834917344859]
It is widely conjectured that heavy-ball momentum method can provide accelerated convergence and should work well in large batch settings.
We show that heavy-ball momentum can provide $tildemathcalO(sqrtkappa)$ accelerated convergence of the bias term of SGD while still achieving near-optimal convergence rate.
This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning.
arXiv Detail & Related papers (2023-12-22T09:58:39Z) - Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte
Carlo [4.656426393230839]
The rise of artificial intelligence (AI) hinges on efficient of modern deep neural networks (DNNs) for non-trips and uncertainty.
In this thesis we propose a tool to handle the problem of Monte Carlo exploitation.
We also propose two dynamic importance sampling algorithms for the underlying ordinary equation (ODE) system.
arXiv Detail & Related papers (2023-05-30T18:25:11Z) - Non-Parametric Learning of Stochastic Differential Equations with Non-asymptotic Fast Rates of Convergence [65.63201894457404]
We propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of non-linear differential equations.
The key idea essentially consists of fitting a RKHS-based approximation of the corresponding Fokker-Planck equation to such observations.
arXiv Detail & Related papers (2023-05-24T20:43:47Z) - Computing the Variance of Shuffling Stochastic Gradient Algorithms via
Power Spectral Density Analysis [6.497816402045099]
Two common alternatives to gradient descent (SGD) with theoretical benefits are random reshuffling (SGDRR) and shuffle-once (SGD-SO)
We study the stationary variances of SGD, SGDRR and SGD-SO, whose leading terms decrease in this order, and obtain simple approximations.
arXiv Detail & Related papers (2022-06-01T17:08:04Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion [67.66101533752605]
Langevin diffusion is a powerful method for non- optimization.
We propose replica exchange, which swaps Langevin diffusions with different temperatures.
By discretizing the replica exchange Langevin diffusion, we obtain a discretetime algorithm.
arXiv Detail & Related papers (2020-07-04T02:52:11Z) - Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed.
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.