Many processors, little time: MCMC for partitions via optimal transport
couplings
- URL: http://arxiv.org/abs/2202.11258v1
- Date: Wed, 23 Feb 2022 01:20:13 GMT
- Title: Many processors, little time: MCMC for partitions via optimal transport
couplings
- Authors: Tin D. Nguyen and Brian L. Trippe and Tamara Broderick
- Abstract summary: We show that straightforward applications of existing coupling ideas to discrete clustering variables fail to meet quickly.
This failure arises from the "label-switching problem": semantically equivalent cluster relabelings impede fast meeting of coupled chains.
Using a metric on the partition space, we formulate a practical algorithm using optimal transport couplings.
- Score: 17.248439315751227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Markov chain Monte Carlo (MCMC) methods are often used in clustering since
they guarantee asymptotically exact expectations in the infinite-time limit. In
finite time, though, slow mixing often leads to poor performance. Modern
computing environments offer massive parallelism, but naive implementations of
parallel MCMC can exhibit substantial bias. In MCMC samplers of continuous
random variables, Markov chain couplings can overcome bias. But these
approaches depend crucially on paired chains meetings after a small number of
transitions. We show that straightforward applications of existing coupling
ideas to discrete clustering variables fail to meet quickly. This failure
arises from the "label-switching problem": semantically equivalent cluster
relabelings impede fast meeting of coupled chains. We instead consider chains
as exploring the space of partitions rather than partitions' (arbitrary)
labelings. Using a metric on the partition space, we formulate a practical
algorithm using optimal transport couplings. Our theory confirms our method is
accurate and efficient. In experiments ranging from clustering of genes or
seeds to graph colorings, we show the benefits of our coupling in the highly
parallel, time-limited regime.
Related papers
- Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations [92.1840862558718]
In practical distributed systems, workers typically not homogeneous, and can have highly varying processing times.
We introduce a new parallel method Freya to handle arbitrarily slow computations.
We show that Freya offers significantly improved complexity guarantees compared to all previous methods.
arXiv Detail & Related papers (2024-05-24T13:33:30Z) - Adversarial Schrödinger Bridge Matching [66.39774923893103]
Iterative Markovian Fitting (IMF) procedure alternates between Markovian and reciprocal projections of continuous-time processes.
We propose a novel Discrete-time IMF (D-IMF) procedure in which learning of processes is replaced by learning just a few transition probabilities in discrete time.
We show that our D-IMF procedure can provide the same quality of unpaired domain translation as the IMF, using only several generation steps instead of hundreds.
arXiv Detail & Related papers (2024-05-23T11:29:33Z) - Distributed MCMC inference for Bayesian Non-Parametric Latent Block
Model [0.24578723416255754]
We introduce a novel Distributed Markov Chain Monte Carlo (MCMC) inference method for the Bayesian Non-Parametric Latent Block Model (DisNPLBM)
Our non-parametric co-clustering algorithm divides observations and features into partitions using latent multivariate Gaussian block distributions.
DisNPLBM demonstrates its impact on cluster labeling accuracy and execution times through experimental results.
arXiv Detail & Related papers (2024-02-01T22:43:55Z) - SpreadNUTS -- Moderate Dynamic Extension of Paths for No-U-Turn Sampling
& Partitioning Visited Regions [0.0]
This paper introduces modifications to a specific Hamiltonian Monte Carlo (HMC) algorithm known as the no-U-turn sampler (NUTS)
NUTS aims to explore the sample space faster than NUTS, yielding a sampler that has faster convergence to the true distribution than NUTS.
arXiv Detail & Related papers (2023-07-09T05:00:25Z) - Improved Bound for Mixing Time of Parallel Tempering [4.68299658663016]
We present a new lower bound for parallel tempering on the spectral gap that has a multimodal dependence on all parameters except $log L$.
This improves the best existing bound which depends exponentially on the number of modes.
We complement our result with a hypothetical upper bound on spectral gap that has an exponential dependence on $log L$, which shows that, in some sense, our bound is tight.
arXiv Detail & Related papers (2023-04-03T19:03:22Z) - Beyond Exponentially Fast Mixing in Average-Reward Reinforcement
Learning via Multi-Level Monte Carlo Actor-Critic [61.968469104271676]
We propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm.
We experimentally show that these alleviated restrictions on the technical conditions required for stability translate to superior performance in practice for RL problems with sparse rewards.
arXiv Detail & Related papers (2023-01-28T04:12:56Z) - Parallel Approaches to Accelerate Bayesian Decision Trees [1.9728521995447947]
We propose two methods for exploiting parallelism in the MCMC.
In the first, we replace the MCMC with another numerical Bayesian approach.
In the second, we consider data partitioning.
arXiv Detail & Related papers (2023-01-22T09:56:26Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - What Are Bayesian Neural Network Posteriors Really Like? [63.950151520585024]
We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles.
We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
arXiv Detail & Related papers (2021-04-29T15:38:46Z) - Contrastive learning of strong-mixing continuous-time stochastic
processes [53.82893653745542]
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data.
We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case.
arXiv Detail & Related papers (2021-03-03T23:06:47Z) - AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic
Gradient MCMC [37.768023232677244]
Hamiltonian Monte Carlo (SGHMC) is an efficient method for sampling from continuous distributions.
We propose a novel second-order SG-MCMC algorithm---AMAGOLD---that infrequently uses Metropolis-Hastings (M-H) corrections to remove bias.
We prove AMAGOLD converges to the target distribution with a fixed, rather than a diminishing, step size, and that its convergence rate is at most a constant factor slower than a full-batch baseline.
arXiv Detail & Related papers (2020-02-29T06:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.