Related papers: Efficiently Vectorized MCMC on Modern Accelerators

Efficiently Vectorized MCMC on Modern Accelerators

URL: http://arxiv.org/abs/2503.17405v1
Date: Thu, 20 Mar 2025 16:07:14 GMT
Title: Efficiently Vectorized MCMC on Modern Accelerators
Authors: Hugh Dance, Pierre Glaser, Peter Orbanz, Ryan Adams,
Abstract summary: We show how to design single-chain MCMC algorithms in a way that avoids synchronization overheads when vectorizing with tools like $textttvmap$ by using the framework of finite state machines (FSMs)<n>We implement several popular MCMC algorithms as FSMs, including Slice Sampling, HMC-NUTS and Delayed Rejection, demonstrating speed-ups of up to an order of magnitude.
Score: 1.952427698056566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the advent of automatic vectorization tools (e.g., JAX's $\texttt{vmap}$), writing multi-chain MCMC algorithms is often now as simple as invoking those tools on single-chain code. Whilst convenient, for various MCMC algorithms this results in a synchronization problem -- loosely speaking, at each iteration all chains running in parallel must wait until the last chain has finished drawing its sample. In this work, we show how to design single-chain MCMC algorithms in a way that avoids synchronization overheads when vectorizing with tools like $\texttt{vmap}$ by using the framework of finite state machines (FSMs). Using a simplified model, we derive an exact theoretical form of the obtainable speed-ups using our approach, and use it to make principled recommendations for optimal algorithm design. We implement several popular MCMC algorithms as FSMs, including Elliptical Slice Sampling, HMC-NUTS, and Delayed Rejection, demonstrating speed-ups of up to an order of magnitude in experiments.

Related papers

A system identification approach to clustering vector autoregressive time series [50.66782357329375]
Clustering time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling.<n>Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction.<n>Instead of relying on feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics.
arXiv Detail & Related papers (2025-05-20T14:31:44Z)
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster [61.83949316226113]
FastCoT is a model-agnostic framework based on parallel decoding. We show that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach.
arXiv Detail & Related papers (2023-11-14T15:56:18Z)
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity [110.50364486645852]
Communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers. We propose Common Om REOm, which can be used to compress information transmitted between machines.
arXiv Detail & Related papers (2023-09-23T08:45:27Z)
SpreadNUTS -- Moderate Dynamic Extension of Paths for No-U-Turn Sampling & Partitioning Visited Regions [0.0]
This paper introduces modifications to a specific Hamiltonian Monte Carlo (HMC) algorithm known as the no-U-turn sampler (NUTS) NUTS aims to explore the sample space faster than NUTS, yielding a sampler that has faster convergence to the true distribution than NUTS.
arXiv Detail & Related papers (2023-07-09T05:00:25Z)
Bayesian Decision Trees Inspired from Evolutionary Algorithms [64.80360020499555]
We propose a replacement of the Markov Chain Monte Carlo (MCMC) with an inherently parallel algorithm, the Sequential Monte Carlo (SMC) Experiments show that SMC combined with the Evolutionary Algorithms (EA) can produce more accurate results compared to MCMC in 100 times fewer iterations.
arXiv Detail & Related papers (2023-05-30T06:17:35Z)
Parallel Approaches to Accelerate Bayesian Decision Trees [1.9728521995447947]
We propose two methods for exploiting parallelism in the MCMC. In the first, we replace the MCMC with another numerical Bayesian approach. In the second, we consider data partitioning.
arXiv Detail & Related papers (2023-01-22T09:56:26Z)
Single MCMC Chain Parallelisation on Decision Trees [0.9137554315375919]
We propose a method to parallelise a single MCMC decision tree chain on an average laptop or personal computer. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.
arXiv Detail & Related papers (2022-07-26T07:07:51Z)
DG-LMC: A Turn-key and Scalable Synchronous Distributed MCMC Algorithm [21.128416842467132]
We derive a user-friendly centralised distributed MCMC algorithm with provable scaling in high-dimensional settings. We illustrate the relevance of the proposed methodology on both synthetic and real data experiments.
arXiv Detail & Related papers (2021-06-11T10:37:14Z)
Involutive MCMC: a Unifying Framework [64.46316409766764]
We describe a wide range of MCMC algorithms in terms of iMCMC. We formulate a number of "tricks" which one can use as design principles for developing new MCMC algorithms. We demonstrate the latter with two examples where we transform known reversible MCMC algorithms into more efficient irreversible ones.
arXiv Detail & Related papers (2020-06-30T10:21:42Z)
On Effective Parallelization of Monte Carlo Tree Search [51.15940034629022]
Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree. How to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood. We demonstrate how proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms.
arXiv Detail & Related papers (2020-06-15T21:36:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.