Stochastic Gradient MCMC with Multi-Armed Bandit Tuning
- URL: http://arxiv.org/abs/2105.13059v2
- Date: Fri, 28 May 2021 13:49:38 GMT
- Title: Stochastic Gradient MCMC with Multi-Armed Bandit Tuning
- Authors: Jeremie Coullon, Leah South, Christopher Nemeth
- Abstract summary: We propose a novel bandit-based algorithm that tunes SGMCMC hyperparameters to maximize the accuracy of the posterior approximation.
We support our results with experiments on both simulated and real datasets, and find that this method is practical for a wide range of application areas.
- Score: 2.2559617939136505
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic gradient Markov chain Monte Carlo (SGMCMC) is a popular class of
algorithms for scalable Bayesian inference. However, these algorithms include
hyperparameters such as step size or batch size that influence the accuracy of
estimators based on the obtained samples. As a result, these hyperparameters
must be tuned by the practitioner and currently no principled and automated way
to tune them exists. Standard MCMC tuning methods based on acceptance rates
cannot be used for SGMCMC, thus requiring alternative tools and diagnostics. We
propose a novel bandit-based algorithm that tunes SGMCMC hyperparameters to
maximize the accuracy of the posterior approximation by minimizing the kernel
Stein discrepancy (KSD). We provide theoretical results supporting this
approach and assess alternative metrics to KSD. We support our results with
experiments on both simulated and real datasets, and find that this method is
practical for a wide range of application areas.
Related papers
- Tuning diagonal scale matrices for HMC [0.0]
Three approaches for adaptively tuning diagonal scale matrices for HMC are discussed and compared.
The common practice of scaling according to estimated marginal standard deviations is taken as a benchmark.
Scaling according to the mean log-target gradient (ISG) and a scaling method targeting that the frequency of when the underlying Hamiltonian dynamics crosses the respective medians are alternatives.
arXiv Detail & Related papers (2024-03-12T10:35:40Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Online Probabilistic Model Identification using Adaptive Recursive MCMC [8.465242072268019]
We suggest the Adaptive Recursive Markov Chain Monte Carlo (ARMCMC) method.
It eliminates the shortcomings of conventional online techniques while computing the entire probability density function of model parameters.
We demonstrate our approach using parameter estimation in a soft bending actuator and the Hunt-Crossley dynamic model.
arXiv Detail & Related papers (2022-10-23T02:06:48Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - An adaptive Hessian approximated stochastic gradient MCMC method [12.93317525451798]
We present an adaptive Hessian approximated gradient MCMC method to incorporate local geometric information while sampling from the posterior.
We adopt a magnitude-based weight pruning method to enforce the sparsity of the network.
arXiv Detail & Related papers (2020-10-03T16:22:15Z) - Non-convex Learning via Replica Exchange Stochastic Gradient MCMC [25.47669573608621]
We propose an adaptive replica exchange SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding properties.
Empirically, we test the algorithm through extensive experiments on various setups and obtain the results.
arXiv Detail & Related papers (2020-08-12T15:02:59Z) - A Kernel-Based Approach to Non-Stationary Reinforcement Learning in
Metric Spaces [53.47210316424326]
KeRNS is an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes.
We prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time.
arXiv Detail & Related papers (2020-07-09T21:37:13Z) - Improving Sampling Accuracy of Stochastic Gradient MCMC Methods via
Non-uniform Subsampling of Gradients [54.90670513852325]
We propose a non-uniform subsampling scheme to improve the sampling accuracy.
EWSG is designed so that a non-uniform gradient-MCMC method mimics the statistical behavior of a batch-gradient-MCMC method.
In our practical implementation of EWSG, the non-uniform subsampling is performed efficiently via a Metropolis-Hastings chain on the data index.
arXiv Detail & Related papers (2020-02-20T18:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.