Non-convex Learning via Replica Exchange Stochastic Gradient MCMC
- URL: http://arxiv.org/abs/2008.05367v3
- Date: Mon, 22 Mar 2021 15:55:25 GMT
- Title: Non-convex Learning via Replica Exchange Stochastic Gradient MCMC
- Authors: Wei Deng, Qi Feng, Liyao Gao, Faming Liang, Guang Lin
- Abstract summary: We propose an adaptive replica exchange SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding properties.
Empirically, we test the algorithm through extensive experiments on various setups and obtain the results.
- Score: 25.47669573608621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Replica exchange Monte Carlo (reMC), also known as parallel tempering, is an
important technique for accelerating the convergence of the conventional Markov
Chain Monte Carlo (MCMC) algorithms. However, such a method requires the
evaluation of the energy function based on the full dataset and is not scalable
to big data. The na\"ive implementation of reMC in mini-batch settings
introduces large biases, which cannot be directly extended to the stochastic
gradient MCMC (SGMCMC), the standard sampling method for simulating from deep
neural networks (DNNs). In this paper, we propose an adaptive replica exchange
SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding
properties. The analysis implies an acceleration-accuracy trade-off in the
numerical discretization of a Markov jump process in a stochastic environment.
Empirically, we test the algorithm through extensive experiments on various
setups and obtain the state-of-the-art results on CIFAR10, CIFAR100, and SVHN
in both supervised learning and semi-supervised learning tasks.
Related papers
- Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Learning Energy-Based Prior Model with Diffusion-Amortized MCMC [89.95629196907082]
Common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress.
We introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it.
arXiv Detail & Related papers (2023-10-05T00:23:34Z) - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z) - Reverse Diffusion Monte Carlo [19.35592726471155]
We propose a novel Monte Carlo sampling algorithm called reverse diffusion Monte Carlo (rdMC)
rdMC is distinct from the Markov chain Monte Carlo (MCMC) methods.
arXiv Detail & Related papers (2023-07-05T05:42:03Z) - Knowledge Removal in Sampling-based Bayesian Inference [86.14397783398711]
When single data deletion requests come, companies may need to delete the whole models learned with massive resources.
Existing works propose methods to remove knowledge learned from data for explicitly parameterized models.
In this paper, we propose the first machine unlearning algorithm for MCMC.
arXiv Detail & Related papers (2022-03-24T10:03:01Z) - Stochastic Gradient MCMC with Multi-Armed Bandit Tuning [2.2559617939136505]
We propose a novel bandit-based algorithm that tunes SGMCMC hyperparameters to maximize the accuracy of the posterior approximation.
We support our results with experiments on both simulated and real datasets, and find that this method is practical for a wide range of application areas.
arXiv Detail & Related papers (2021-05-27T11:00:31Z) - Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC [83.48593305367523]
Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions.
We introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions.
We demonstrate the efficacy of our algorithm on a range of examples from statistics, computational physics and machine learning, and observe improvements compared to alternative algorithms.
arXiv Detail & Related papers (2021-02-04T02:21:08Z) - Accelerating MCMC algorithms through Bayesian Deep Networks [7.054093620465401]
Markov Chain Monte Carlo (MCMC) algorithms are commonly used for their versatility in sampling from complicated probability distributions.
As the dimension of the distribution gets larger, the computational costs for a satisfactory exploration of the sampling space become challenging.
We show an alternative way of performing adaptive MCMC, by using the outcome of Bayesian Neural Networks as the initial proposal for the Markov Chain.
arXiv Detail & Related papers (2020-11-29T04:29:00Z) - An adaptive Hessian approximated stochastic gradient MCMC method [12.93317525451798]
We present an adaptive Hessian approximated gradient MCMC method to incorporate local geometric information while sampling from the posterior.
We adopt a magnitude-based weight pruning method to enforce the sparsity of the network.
arXiv Detail & Related papers (2020-10-03T16:22:15Z) - Improving Sampling Accuracy of Stochastic Gradient MCMC Methods via
Non-uniform Subsampling of Gradients [54.90670513852325]
We propose a non-uniform subsampling scheme to improve the sampling accuracy.
EWSG is designed so that a non-uniform gradient-MCMC method mimics the statistical behavior of a batch-gradient-MCMC method.
In our practical implementation of EWSG, the non-uniform subsampling is performed efficiently via a Metropolis-Hastings chain on the data index.
arXiv Detail & Related papers (2020-02-20T18:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.