BFTS: Thompson Sampling with Bayesian Additive Regression Trees
- URL: http://arxiv.org/abs/2602.07767v1
- Date: Sun, 08 Feb 2026 01:54:56 GMT
- Title: BFTS: Thompson Sampling with Bayesian Additive Regression Trees
- Authors: Ruizhe Deng, Bibhas Chakraborty, Ran Chen, Yan Shuo Tan,
- Abstract summary: Contextual bandits are a core technology for personalized mobile health interventions.<n>We propose the first contextual bandit algorithm to integrate Bayesian Additive Regression Trees (BART) directly into the exploration loop.<n>In an offline policy evaluation, BFTS improves engagement rates by over 30% compared to the deployed policy.
- Score: 3.5914503354050336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contextual bandits are a core technology for personalized mobile health interventions, where decision-making requires adapting to complex, non-linear user behaviors. While Thompson Sampling (TS) is a preferred strategy for these problems, its performance hinges on the quality of the underlying reward model. Standard linear models suffer from high bias, while neural network approaches are often brittle and difficult to tune in online settings. Conversely, tree ensembles dominate tabular data prediction but typically rely on heuristic uncertainty quantification, lacking a principled probabilistic basis for TS. We propose Bayesian Forest Thompson Sampling (BFTS), the first contextual bandit algorithm to integrate Bayesian Additive Regression Trees (BART), a fully probabilistic sum-of-trees model, directly into the exploration loop. We prove that BFTS is theoretically sound, deriving an information-theoretic Bayesian regret bound of $\tilde{O}(\sqrt{T})$. As a complementary result, we establish frequentist minimax optimality for a "feel-good" variant, confirming the structural suitability of BART priors for non-parametric bandits. Empirically, BFTS achieves state-of-the-art regret on tabular benchmarks with near-nominal uncertainty calibration. Furthermore, in an offline policy evaluation on the Drink Less micro-randomized trial, BFTS improves engagement rates by over 30% compared to the deployed policy, demonstrating its practical effectiveness for behavioral interventions.
Related papers
- Richer Bayesian Last Layers with Subsampled NTK Features [25.566044416945875]
Bayesian Last Layers (BLLs) provide a convenient and computationally efficient way to estimate uncertainty in neural networks.<n>We propose a method that improves BLLs by leveraging a projection of Neural Tangent Kernel (NTK) features onto the space spanned by the last-layer features.<n>This enables posterior inference that accounts for variability of the full network while retaining the low computational cost of inference of a standard BLL.
arXiv Detail & Related papers (2026-02-01T15:24:20Z) - PAC-Bayes Meets Online Contextual Optimization [4.004966432215451]
This work introduces, to the best of our knowledge, the first online contextual optimization framework.<n>Grounded in PAC-Bayes theory and general Bayesian updating principles, our framework achieves $mathcalO(sqrtT)$ regret for bounded and mixable losses via a Gibbs posterior.
arXiv Detail & Related papers (2025-11-25T15:37:31Z) - Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update [70.38810219913593]
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function.<n>GLBs are widely applicable to real-world scenarios, but their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency.<n>We propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcalO(1)$ time and space complexities per round.
arXiv Detail & Related papers (2025-07-16T02:24:21Z) - BAPE: Learning an Explicit Bayes Classifier for Long-tailed Visual Recognition [78.70453964041718]
Current deep learning algorithms usually solve for the optimal classifier by emphimplicitly estimating the posterior probabilities.<n>This simple methodology has been proven effective for meticulously balanced academic benchmark datasets.<n>However, it is not applicable to the long-tailed data distributions in the real world.<n>This paper presents a novel approach (BAPE) that provides a more precise theoretical estimation of the data distributions.
arXiv Detail & Related papers (2025-06-29T15:12:50Z) - Batched Nonparametric Bandits via k-Nearest Neighbor UCB [0.0]
We study sequential decision-making in batched nonparametric contextual bandits.<n>We propose a nonparametric algorithm that combines adaptive k-nearest neighbor (k-NN) regression with the upper confidence bound (UCB) principle.<n>Our method, BaNk-UCB, is fully nonparametric, adapts to the context dimension, and is simple to implement.
arXiv Detail & Related papers (2025-05-15T17:00:51Z) - Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z) - Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation.
We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation.
We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z) - GP-BART: a novel Bayesian additive regression trees approach using
Gaussian processes [1.03590082373586]
The GP-BART model is an extension of BART which addresses the limitation by assuming GP priors for the predictions of each terminal node among all trees.
The model's effectiveness is demonstrated through applications to simulated and real-world data, surpassing the performance of traditional modeling approaches in various scenarios.
arXiv Detail & Related papers (2022-04-05T11:18:44Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - Deep Bandits Show-Off: Simple and Efficient Exploration with Deep
Networks [14.178899938667161]
We introduce Sample Average Uncertainty (SAU), a simple and efficient uncertainty measure for contextual bandits.
Because of its simplicity SAU can be seamlessly applied to deep contextual bandits as a very scalable drop-in replacement for epsilon-greedy exploration.
arXiv Detail & Related papers (2021-05-10T21:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.