Improving Sampling Accuracy of Stochastic Gradient MCMC Methods via
Non-uniform Subsampling of Gradients
- URL: http://arxiv.org/abs/2002.08949v3
- Date: Wed, 22 Sep 2021 22:49:34 GMT
- Title: Improving Sampling Accuracy of Stochastic Gradient MCMC Methods via
Non-uniform Subsampling of Gradients
- Authors: Ruilin Li, Xin Wang, Hongyuan Zha and Molei Tao
- Abstract summary: We propose a non-uniform subsampling scheme to improve the sampling accuracy.
EWSG is designed so that a non-uniform gradient-MCMC method mimics the statistical behavior of a batch-gradient-MCMC method.
In our practical implementation of EWSG, the non-uniform subsampling is performed efficiently via a Metropolis-Hastings chain on the data index.
- Score: 54.90670513852325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many Markov Chain Monte Carlo (MCMC) methods leverage gradient information of
the potential function of target distribution to explore sample space
efficiently. However, computing gradients can often be computationally
expensive for large scale applications, such as those in contemporary machine
learning. Stochastic Gradient (SG-)MCMC methods approximate gradients by
stochastic ones, commonly via uniformly subsampled data points, and achieve
improved computational efficiency, however at the price of introducing sampling
error. We propose a non-uniform subsampling scheme to improve the sampling
accuracy. The proposed exponentially weighted stochastic gradient (EWSG) is
designed so that a non-uniform-SG-MCMC method mimics the statistical behavior
of a batch-gradient-MCMC method, and hence the inaccuracy due to SG
approximation is reduced. EWSG differs from classical variance reduction (VR)
techniques as it focuses on the entire distribution instead of just the
variance; nevertheless, its reduced local variance is also proved. EWSG can
also be viewed as an extension of the importance sampling idea, successful for
stochastic-gradient-based optimizations, to sampling tasks. In our practical
implementation of EWSG, the non-uniform subsampling is performed efficiently
via a Metropolis-Hastings chain on the data index, which is coupled to the MCMC
algorithm. Numerical experiments are provided, not only to demonstrate EWSG's
effectiveness, but also to guide hyperparameter choices, and validate our
\emph{non-asymptotic global error bound} despite of approximations in the
implementation. Notably, while statistical accuracy is improved, convergence
speed can be comparable to the uniform version, which renders EWSG a practical
alternative to VR (but EWSG and VR can be combined too).
Related papers
- Robust Approximate Sampling via Stochastic Gradient Barker Dynamics [0.0]
We introduce the Barker gradient dynamics (SGBD) algorithm, a robust alternative to Langevin-based sampling algorithms, to the gradient framework.
We characterize the impact of gradients on the Barker transition mechanism and develop a bias-corrected version that eliminates the error due to the gradient noise in the proposal.
arXiv Detail & Related papers (2024-05-14T23:47:02Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Preferential Subsampling for Stochastic Gradient Langevin Dynamics [3.158346511479111]
gradient MCMC offers an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data.
The resulting gradient estimator may exhibit a high variance and impact sampler performance.
We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used.
arXiv Detail & Related papers (2022-10-28T14:56:18Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - An adaptive Hessian approximated stochastic gradient MCMC method [12.93317525451798]
We present an adaptive Hessian approximated gradient MCMC method to incorporate local geometric information while sampling from the posterior.
We adopt a magnitude-based weight pruning method to enforce the sparsity of the network.
arXiv Detail & Related papers (2020-10-03T16:22:15Z) - Non-convex Learning via Replica Exchange Stochastic Gradient MCMC [25.47669573608621]
We propose an adaptive replica exchange SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding properties.
Empirically, we test the algorithm through extensive experiments on various setups and obtain the results.
arXiv Detail & Related papers (2020-08-12T15:02:59Z) - Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.