Related papers: Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study

Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study

URL: http://arxiv.org/abs/2311.14359v2
Date: Tue, 30 Jul 2024 03:15:26 GMT
Title: Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study
Authors: Xueqing Liu, Nina Deliu, Tanujit Chakraborty, Lauren Bell, Bibhas Chakraborty,
Abstract summary: Mobile health interventions aim to improve distal outcomes, such as clinical conditions, by optimizing proximal outcomes through just-in-time adaptive interventions. Contextual bandits provide a suitable framework for customizing such interventions according to individual time-varying contexts. The current work addresses this challenge by leveraging count data models into online decision-making approaches.
Score: 1.5936659933030128
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mobile health (mHealth) interventions often aim to improve distal outcomes, such as clinical conditions, by optimizing proximal outcomes through just-in-time adaptive interventions. Contextual bandits provide a suitable framework for customizing such interventions according to individual time-varying contexts. However, unique challenges, such as modeling count outcomes within bandit frameworks, have hindered the widespread application of contextual bandits to mHealth studies. The current work addresses this challenge by leveraging count data models into online decision-making approaches. Specifically, we combine four common offline count data models (Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regressions) with Thompson sampling, a popular contextual bandit algorithm. The proposed algorithms are motivated by and evaluated on a real dataset from the Drink Less trial, where they are shown to improve user engagement with the mHealth platform. The proposed methods are further evaluated on simulated data, achieving improvement in maximizing cumulative proximal outcomes over existing algorithms. Theoretical results on regret bounds are also derived. The countts R package provides an implementation of our approach.

Related papers

Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update [60.414548453838506]
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function.<n>GLBs are widely applicable to real-world scenarios, but their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency.<n>We propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcalO(1)$ time and space complexities per round.
arXiv Detail & Related papers (2025-07-16T02:24:21Z)
Robust Spatiotemporal Epidemic Modeling with Integrated Adaptive Outlier Detection [7.5504472850103435]
In epidemic modeling, outliers can distort parameter estimation and lead to misguided public health decisions.<n>We introduce a robust generalized additive model (RST-GAM) to mitigate this distortion.<n>We demonstrate the practical utility of RST-GAM by analyzing county-level COVID-19 infection data in the United States.
arXiv Detail & Related papers (2025-07-12T19:23:25Z)
Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization [2.1046873879077794]
The contextual bandit framework is widely used to solve sequential optimization problems. We consider settings in which the mean reward is known to be a concave function of the action for each fixed context. We propose a contextual bandit algorithm that accelerates optimization by conditioning the posterior of a Bayesian Gaussian Process model on this concavity information.
arXiv Detail & Related papers (2025-03-13T19:35:54Z)
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We also extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series [45.76310830281876]
We propose Quantile Sub-Ensembles, a novel method to estimate uncertainty with ensemble of quantile-regression-based task networks. Our method not only produces accurate imputations that is robust to high missing rates, but also is computationally efficient due to the fast training of its non-generative model.
arXiv Detail & Related papers (2023-12-03T05:52:30Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
Policy Optimization for Personalized Interventions in Behavioral Health [8.10897203067601]
Behavioral health interventions, delivered through digital platforms, have the potential to significantly improve health outcomes. We study the problem of optimizing personalized interventions for patients to maximize a long-term outcome. We present a new approach for this problem that we dub DecompPI, which decomposes the state space for a system of patients to the individual level.
arXiv Detail & Related papers (2023-03-21T21:42:03Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning [2.0625936401496237]
Recent advances in mobile health (mHealth) technology provide an effective way to monitor individuals' health statuses and deliver just-in-time personalized interventions. The practical use of mHealth technology raises unique challenges to existing methodologies on learning an optimal dynamic treatment regime. We propose a Proximal Temporal Learning framework to estimate an optimal regime adaptively adjusted between deterministic and sparse policy models.
arXiv Detail & Related papers (2021-10-20T18:38:22Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking. They can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z)
Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits [28.504921333436833]
We propose a variant of a Thompson sampling based algorithm that adaptively re-weighs the terms of a doubly robust estimator on the true mean reward of each arm. The proposed algorithm matches the optimal (minimax) regret rate and its empirical evaluation in a semi-synthetic experiment. We extend this approach to contextual bandits, where there are more sources of bias present apart from the adaptive data collection.
arXiv Detail & Related papers (2021-02-25T22:29:25Z)
Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff. We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z)
Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables. The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z)
Online Batch Decision-Making with High-Dimensional Covariates [20.06690325969748]
We propose and investigate a class of new algorithms for sequential decision making that interact with textita batch of users simultaneously instead of textita user at each decision epoch. We deliver a solution, named textitTeamwork LASSO Bandit algorithm, that resolves a batch version of explore-exploit dilemma via switching between stage and selfish stage during the whole decision process.
arXiv Detail & Related papers (2020-02-21T17:36:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.