The Computational Curse of Big Data for Bayesian Additive Regression Trees: A Hitting Time Analysis
- URL: http://arxiv.org/abs/2406.19958v1
- Date: Fri, 28 Jun 2024 14:45:29 GMT
- Title: The Computational Curse of Big Data for Bayesian Additive Regression Trees: A Hitting Time Analysis
- Authors: Yan Shuo Tan, Omer Ronen, Theo Saarinen, Bin Yu,
- Abstract summary: We show that the BART sampler often converges slowly, confirming empirical observations by other researchers.
As $n$ increases, the approximate BART posterior becomes increasingly different from the exact posterior.
- Score: 8.36826153664925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bayesian Additive Regression Trees (BART) is a popular Bayesian non-parametric regression model that is commonly used in causal inference and beyond. Its strong predictive performance is supported by theoretical guarantees that its posterior distribution concentrates around the true regression function at optimal rates under various data generative settings and for appropriate prior choices. In this paper, we show that the BART sampler often converges slowly, confirming empirical observations by other researchers. Assuming discrete covariates, we show that, while the BART posterior concentrates on a set comprising all optimal tree structures (smallest bias and complexity), the Markov chain's hitting time for this set increases with $n$ (training sample size), under several common data generative settings. As $n$ increases, the approximate BART posterior thus becomes increasingly different from the exact posterior (for the same number of MCMC samples), contrasting with earlier concentration results on the exact posterior. This contrast is highlighted by our simulations showing worsening frequentist undercoverage for approximate posterior intervals and a growing ratio between the MSE of the approximate posterior and that obtainable by artificially improving convergence via averaging multiple sampler chains. Finally, based on our theoretical insights, possibilities are discussed to improve the BART sampler convergence performance.
Related papers
- Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Batch and match: black-box variational inference with a score-based divergence [26.873037094654826]
We propose batch and match (BaM) as an alternative approach to blackbox variational inference (BBVI) based on a score-based divergence.
We show that BaM converges in fewer evaluations than leading implementations of BBVI based on ELBO.
arXiv Detail & Related papers (2024-02-22T18:20:22Z) - Co-data Learning for Bayesian Additive Regression Trees [0.0]
We propose to incorporate co-data into a sum-of-trees prediction model.
The proposed method can handle multiple types of co-data simultaneously.
Co-data enhances prediction in an application to diffuse large B-cell lymphoma prognosis.
arXiv Detail & Related papers (2023-11-16T16:14:39Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Theory of Posterior Concentration for Generalized Bayesian Additive
Regression Trees [0.685316573653194]
We describe a Generalized framework for Bayesian trees and their additive ensembles.
We derive sufficient conditions on the response distribution, under which the posterior concentrates at a minimax rate, up to a logarithmic factor.
arXiv Detail & Related papers (2023-04-25T00:52:48Z) - A Mixing Time Lower Bound for a Simplified Version of BART [5.149859291357858]
We provide the first lower bound on the mixing time for a simplified version of BART.
Inspired by this new connection between the mixing time and the number of data points, we perform rigorous simulations on BART.
We show qualitatively that BART's mixing time increases with the number of data points.
arXiv Detail & Related papers (2022-10-17T18:45:36Z) - Adversarial Bayesian Simulation [0.9137554315375922]
We bridge approximate Bayesian computation (ABC) with deep neural implicit samplers based on adversarial networks (GANs) and adversarial variational Bayes.
We develop a Bayesian GAN that directly targets the posterior by solving an adversarial optimization problem.
We show that the typical total variation distance between the true and approximate posteriors converges to zero for certain neural network generators and discriminators.
arXiv Detail & Related papers (2022-08-25T14:18:39Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Instance-Optimal Compressed Sensing via Posterior Sampling [101.43899352984774]
We show for Gaussian measurements and emphany prior distribution on the signal, that the posterior sampling estimator achieves near-optimal recovery guarantees.
We implement the posterior sampling estimator for deep generative priors using Langevin dynamics, and empirically find that it produces accurate estimates with more diversity than MAP.
arXiv Detail & Related papers (2021-06-21T22:51:56Z) - What Are Bayesian Neural Network Posteriors Really Like? [63.950151520585024]
We show that Hamiltonian Monte Carlo can achieve significant performance gains over standard and deep ensembles.
We also show that deep distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.
arXiv Detail & Related papers (2021-04-29T15:38:46Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.