Related papers: Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

URL: http://arxiv.org/abs/2010.01084v2
Date: Thu, 18 Mar 2021 16:24:47 GMT
Title: Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction
Authors: Wei Deng and Qi Feng and Georgios Karagiannis and Guang Lin and Faming Liang
Abstract summary: We study the reduction for a noisy energy estimators variance, which promotes much more effective analysis. We obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.
Score: 24.794221009364772
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Replica exchange stochastic gradient Langevin dynamics (reSGLD) has shown promise in accelerating the convergence in non-convex learning; however, an excessively large correction for avoiding biases from noisy energy estimators has limited the potential of the acceleration. To address this issue, we study the variance reduction for noisy energy estimators, which promotes much more effective swaps. Theoretically, we provide a non-asymptotic analysis on the exponential acceleration for the underlying continuous-time Markov jump process; moreover, we consider a generalized Girsanov theorem which includes the change of Poisson measure to overcome the crude discretization based on the Gr\"{o}wall's inequality and yields a much tighter error in the 2-Wasserstein ($\mathcal{W}_2$) distance. Numerically, we conduct extensive experiments and obtain the state-of-the-art results in optimization and uncertainty estimates for synthetic experiments and image data.

Related papers

Accelerated forward-backward and Douglas-Rachford splitting dynamics [0.0]
We study convergence properties of continuous-time variants of accelerated Forward-Backward (FB) and Douglas-Rachford (DR) splitting algorithms. For FB splitting dynamics, we demonstrate that accelerated exponential convergence rate carries over to general strongly convex problems.
arXiv Detail & Related papers (2024-07-30T07:52:22Z)
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions. Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z)
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise [16.12834917344859]
It is widely conjectured that heavy-ball momentum method can provide accelerated convergence and should work well in large batch settings. We show that heavy-ball momentum can provide $tildemathcalO(sqrtkappa)$ accelerated convergence of the bias term of SGD while still achieving near-optimal convergence rate. This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning.
arXiv Detail & Related papers (2023-12-22T09:58:39Z)
Non-convex Bayesian Learning via Stochastic Gradient Markov Chain Monte Carlo [4.656426393230839]
The rise of artificial intelligence (AI) hinges on efficient of modern deep neural networks (DNNs) for non-trips and uncertainty. In this thesis we propose a tool to handle the problem of Monte Carlo exploitation. We also propose two dynamic importance sampling algorithms for the underlying ordinary equation (ODE) system.
arXiv Detail & Related papers (2023-05-30T18:25:11Z)
Non-Parametric Learning of Stochastic Differential Equations with Non-asymptotic Fast Rates of Convergence [65.63201894457404]
We propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of non-linear differential equations. The key idea essentially consists of fitting a RKHS-based approximation of the corresponding Fokker-Planck equation to such observations.
arXiv Detail & Related papers (2023-05-24T20:43:47Z)
Computing the Variance of Shuffling Stochastic Gradient Algorithms via Power Spectral Density Analysis [6.497816402045099]
Two common alternatives to gradient descent (SGD) with theoretical benefits are random reshuffling (SGDRR) and shuffle-once (SGD-SO) We study the stationary variances of SGD, SGDRR and SGD-SO, whose leading terms decrease in this order, and obtain simple approximations.
arXiv Detail & Related papers (2022-06-01T17:08:04Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion [67.66101533752605]
Langevin diffusion is a powerful method for non- optimization. We propose replica exchange, which swaps Langevin diffusions with different temperatures. By discretizing the replica exchange Langevin diffusion, we obtain a discretetime algorithm.
arXiv Detail & Related papers (2020-07-04T02:52:11Z)
Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed. This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.