On Mixing Rates for Bayesian CART
- URL: http://arxiv.org/abs/2306.00126v1
- Date: Wed, 31 May 2023 19:04:28 GMT
- Title: On Mixing Rates for Bayesian CART
- Authors: Jungeum Kim, Veronika Rockova
- Abstract summary: This work focuses on the Bayesian CART algorithm which forms a building block of Bayesian Additive Regression Trees (BART)
We derive upper bounds on mixing times for typical posteriors under various proposal distributions.
A thorough simulation study highlights discrepancies between spike-and-slab priors and Bayesian CART under a variety of proposals.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of Bayesian inference with MCMC depends critically on Markov
chains rapidly reaching the posterior distribution. Despite the plentitude of
inferential theory for posteriors in Bayesian non-parametrics, convergence
properties of MCMC algorithms that simulate from such ideal inferential targets
are not thoroughly understood. This work focuses on the Bayesian CART algorithm
which forms a building block of Bayesian Additive Regression Trees (BART). We
derive upper bounds on mixing times for typical posteriors under various
proposal distributions. Exploiting the wavelet representation of trees, we
provide sufficient conditions for Bayesian CART to mix well (polynomially)
under certain hierarchical connectivity restrictions on the signal. We also
derive a negative result showing that Bayesian CART (based on simple grow and
prune steps) cannot reach deep isolated signals in faster than exponential
mixing time. To remediate myopic tree exploration, we propose Twiggy Bayesian
CART which attaches/detaches entire twigs (not just single nodes) in the
proposal distribution. We show polynomial mixing of Twiggy Bayesian CART
without assuming that the signal is connected on a tree. Going further, we show
that informed variants achieve even faster mixing. A thorough simulation study
highlights discrepancies between spike-and-slab priors and Bayesian CART under
a variety of proposals.
Related papers
- The Computational Curse of Big Data for Bayesian Additive Regression Trees: A Hitting Time Analysis [8.36826153664925]
We show that the BART sampler often converges slowly, confirming empirical observations by other researchers.
As $n$ increases, the approximate BART posterior becomes increasingly different from the exact posterior.
arXiv Detail & Related papers (2024-06-28T14:45:29Z) - Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with
Heterogeneous Rewards [14.822625665220068]
We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment.
The goal is to minimize the overall regret of the entire system through collaborations.
arXiv Detail & Related papers (2023-06-08T22:37:24Z) - Bayesian Decision Trees Inspired from Evolutionary Algorithms [64.80360020499555]
We propose a replacement of the Markov Chain Monte Carlo (MCMC) with an inherently parallel algorithm, the Sequential Monte Carlo (SMC)
Experiments show that SMC combined with the Evolutionary Algorithms (EA) can produce more accurate results compared to MCMC in 100 times fewer iterations.
arXiv Detail & Related papers (2023-05-30T06:17:35Z) - Bayesian Pseudo-Coresets via Contrastive Divergence [5.479797073162603]
We introduce a novel approach for constructing pseudo-coresets by utilizing contrastive divergence.
It eliminates the need for approximations in the pseudo-coreset construction process.
We conduct extensive experiments on multiple datasets, demonstrating its superiority over existing BPC techniques.
arXiv Detail & Related papers (2023-03-20T17:13:50Z) - A Mixing Time Lower Bound for a Simplified Version of BART [5.149859291357858]
We provide the first lower bound on the mixing time for a simplified version of BART.
Inspired by this new connection between the mixing time and the number of data points, we perform rigorous simulations on BART.
We show qualitatively that BART's mixing time increases with the number of data points.
arXiv Detail & Related papers (2022-10-17T18:45:36Z) - Online, Informative MCMC Thinning with Kernelized Stein Discrepancy [14.218897034491125]
We propose an MCMC variant that retains only those posterior samples which exceed a KSD threshold.
We establish the convergence and complexity tradeoffs for several settings of KSD Thinning.
Code is available atcolehawkins.com/colehawkins/KSD-Thinning.
arXiv Detail & Related papers (2022-01-18T17:13:21Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z) - What Are Bayesian Neural Network Posteriors Really Like? [63.950151520585024]
We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles.
We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
arXiv Detail & Related papers (2021-04-29T15:38:46Z) - On Batch Normalisation for Approximate Bayesian Inference [102.94525205971873]
We show that batch-normalisation does not affect the optimum of the evidence lower bound (ELBO)
We also study the Monte Carlo Batch Normalisation (MCBN) algorithm, proposed as an approximate inference technique parallel to MC Dropout.
arXiv Detail & Related papers (2020-12-24T12:40:11Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z) - Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples.
A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set.
We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.