Comparison of parallel SMC and MCMC for Bayesian deep learning
- URL: http://arxiv.org/abs/2402.06173v3
- Date: Wed, 20 Aug 2025 10:50:33 GMT
- Title: Comparison of parallel SMC and MCMC for Bayesian deep learning
- Authors: Xinzhu Liang, Joseph M. Lukens, Sanjaya Lohani, Brian T. Kirby, Thomas A. Searles, Xin Qiu, Kody J. H. Law,
- Abstract summary: This work systematically compares parallel implementations of consistent (asymptotically unbiased) Bayesian deep learning algorithms.<n>We provide a proof of convergence for SMC$_parallel$ showing that it theoretically achieves the same level of convergence as a single monolithic SMC sampler.
- Score: 3.4661537979254655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work systematically compares parallel implementations of consistent (asymptotically unbiased) Bayesian deep learning algorithms: sequential Monte Carlo sampler (SMC$_\parallel$) or Markov chain Monte Carlo (MCMC$_\parallel$). We provide a proof of convergence for SMC$_\parallel$ showing that it theoretically achieves the same level of convergence as a single monolithic SMC sampler, while the reduced communication lowers wall-clock time. It is well-known that the first samples from MCMC need to be discarded to eliminate initialization bias, and that the number of discarded samples must grow like the logarithm of the number of parallel chains to control that bias for MCMC$_\parallel$. A systematic empirical numerical study on MNIST, CIFAR, and IMDb, reveals that parallel implementations of both methods perform comparably to non-parallel implementations in terms of performance and total cost, and also comparably to each other. However, both methods still require a large wall-clock time, and suffer from catastrophic non-convergence if they aren't run for long enough.
Related papers
- Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing [76.48164395646019]
Parallel-Probe is a training-free controller designed to optimize online parallel thinking.<n>It reduces sequential tokens by up to $textbf35.8$% and total token cost by over $textbf25.8$% while maintaining competitive accuracy.
arXiv Detail & Related papers (2026-02-03T18:59:41Z) - Monte Carlo and quasi-Monte Carlo integration for likelihood functions [0.0]
We compare the integration error of Monte Carlo (MC) and quasi-Monte Carlo (QMC) methods for approximating the normalizing constant of posterior distributions and certain marginal likelihoods.<n>We find that if the dimension of the integral remains fixed as $n$ and $m$ tend to infinity, the scaling rate of the relative error of MC integration includes an additional $n1/2log(n)p/2$ data-dependent factor.<n>An additional contribution of this work is a bound on the high-dimensional scaling of the star discrepancy for the Halton sequence.
arXiv Detail & Related papers (2025-06-26T19:39:30Z) - Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles [3.4661537979254655]
This work introduces a new method called scalable Bayesian Monte Carlo (SBMC)<n>The algorithm is a parallel implementation of a consistent (asymptotically unbiased) Bayesian deep learning algorithm: Monte Carlo (SMC) or Markov chain Monte Carlo (MCMC)
arXiv Detail & Related papers (2025-05-19T17:55:32Z) - Efficiently Vectorized MCMC on Modern Accelerators [1.952427698056566]
We show how to design single-chain MCMC algorithms in a way that avoids synchronization overheads when vectorizing with tools like $textttvmap$ by using the framework of finite state machines (FSMs)<n>We implement several popular MCMC algorithms as FSMs, including Slice Sampling, HMC-NUTS and Delayed Rejection, demonstrating speed-ups of up to an order of magnitude.
arXiv Detail & Related papers (2025-03-20T16:07:14Z) - Parallel Simulation for Log-concave Sampling and Score-based Diffusion Models [55.07411490538404]
We propose a novel parallel sampling method that improves adaptive complexity dependence on dimension $d$.<n>Our approach builds on parallel simulation techniques from scientific computing.
arXiv Detail & Related papers (2024-12-10T11:50:46Z) - Measuring quantum relative entropy with finite-size effect [53.64687146666141]
We study the estimation of relative entropy $D(rho|sigma)$ when $sigma$ is known.
Our estimator attains the Cram'er-Rao type bound when the dimension $d$ is fixed.
arXiv Detail & Related papers (2024-06-25T06:07:20Z) - Dividing quantum circuits for time evolution of stochastic processes by orthogonal series density estimation [0.0]
Quantum Monte Carlo integration (QMCI) is a quantum algorithm to estimate expectations of random variables.
In this paper, we propose a method to divide $U_X(t)$ based on orthogonal series density estimation.
Our method achieves the circuit depth and total query complexity scaling as $O(sqrtN)$ and $O(N3/2)$, respectively.
arXiv Detail & Related papers (2024-06-04T01:50:21Z) - Convergence Bounds for Sequential Monte Carlo on Multimodal Distributions using Soft Decomposition [6.872242798058046]
We prove bounds on the variance of a function $f$ under the empirical measure of the samples obtained by the Sequential Monte Carlo (SMC) algorithm.
We show that bounds can be obtained in the truly multi-modal setting, with mixing times that depend on local MCMC dynamics.
arXiv Detail & Related papers (2024-05-29T22:43:45Z) - Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations [92.1840862558718]
In practical distributed systems, workers typically not homogeneous, and can have highly varying processing times.
We introduce a new parallel method Freya to handle arbitrarily slow computations.
We show that Freya offers significantly improved complexity guarantees compared to all previous methods.
arXiv Detail & Related papers (2024-05-24T13:33:30Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - More Quantum Speedups for Multiproposal MCMC [9.09160682938666]
Multiproposal Markov chain Monte Carlo (MCMC) algorithms choose from multiple proposals at each iteration in order to sample from challenging target distributions more efficiently.
Recent work demonstrates the possibility of quadratic quantum speedups for one such multiproposal MCMC algorithm.
We present a fast new quantum multiproposal MCMC strategy, QPMCMC2, that only requires $mathcalO(1)$ target evaluations and $mathcalO(log P)$ qubits.
arXiv Detail & Related papers (2023-12-03T14:05:08Z) - SpreadNUTS -- Moderate Dynamic Extension of Paths for No-U-Turn Sampling
& Partitioning Visited Regions [0.0]
This paper introduces modifications to a specific Hamiltonian Monte Carlo (HMC) algorithm known as the no-U-turn sampler (NUTS)
NUTS aims to explore the sample space faster than NUTS, yielding a sampler that has faster convergence to the true distribution than NUTS.
arXiv Detail & Related papers (2023-07-09T05:00:25Z) - Bayesian Decision Trees Inspired from Evolutionary Algorithms [64.80360020499555]
We propose a replacement of the Markov Chain Monte Carlo (MCMC) with an inherently parallel algorithm, the Sequential Monte Carlo (SMC)
Experiments show that SMC combined with the Evolutionary Algorithms (EA) can produce more accurate results compared to MCMC in 100 times fewer iterations.
arXiv Detail & Related papers (2023-05-30T06:17:35Z) - Convergence Analysis of Stochastic Gradient Descent with MCMC Estimators [8.493584276672971]
gradient descent (SGD) and its variants is essential for machine learning.
In this paper, we consider the SGD algorithm that employ the Markov Chain Monte Carlo (MCMC) estimator to compute the gradient.
It is shown that MCMC-SGD escapes from saddle points and reaches $(epsilon,epsilon1/4)$ approximate second order stationary points.
arXiv Detail & Related papers (2023-03-19T08:29:49Z) - Parallel Approaches to Accelerate Bayesian Decision Trees [1.9728521995447947]
We propose two methods for exploiting parallelism in the MCMC.
In the first, we replace the MCMC with another numerical Bayesian approach.
In the second, we consider data partitioning.
arXiv Detail & Related papers (2023-01-22T09:56:26Z) - De-Sequentialized Monte Carlo: a parallel-in-time particle smoother [3.97478982737167]
We propose dSMC (de-Sequentialized Monte Carlo), a new particle smoother that is able to process $T$ observations in $mathcalO(log T)$ time.
This compares favourably with standard particle smoothers, the complexity of which is linear in $T$.
We also design a particle Gibbs sampler based on dSMC, which is able to perform parameter inference in a state-space model at a $mathcalO(log(T))$ cost on parallel hardware.
arXiv Detail & Related papers (2022-02-04T17:46:32Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel
Optimization [62.87181271021217]
We consider non-SBO problems that have many applications in machine learning.
This paper proposes fast randomized algorithms for non-SBO problems.
arXiv Detail & Related papers (2021-05-05T18:28:42Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - On Effective Parallelization of Monte Carlo Tree Search [51.15940034629022]
Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree.
How to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood.
We demonstrate how proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms.
arXiv Detail & Related papers (2020-06-15T21:36:00Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.