Thompson sampling for zero-inflated count outcomes with an application
to the Drink Less mobile health study
- URL: http://arxiv.org/abs/2311.14359v1
- Date: Fri, 24 Nov 2023 09:02:24 GMT
- Title: Thompson sampling for zero-inflated count outcomes with an application
to the Drink Less mobile health study
- Authors: Xueqing Liu, Nina Deliu, Tanujit Chakraborty, Lauren Bell, Bibhas
Chakraborty
- Abstract summary: Mobile health (mHealth) technologies aim to improve distal outcomes, such as clinical conditions, by optimizing proximal outcomes through just-in-time adaptive interventions.
Contextual bandits provide a suitable framework for customizing such interventions according to individual time-varying contexts.
However, unique challenges such as modeling count outcomes within bandit frameworks have hindered the widespread application of contextual bandits to mHealth studies.
- Score: 1.7097581131234332
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mobile health (mHealth) technologies aim to improve distal outcomes, such as
clinical conditions, by optimizing proximal outcomes through just-in-time
adaptive interventions. Contextual bandits provide a suitable framework for
customizing such interventions according to individual time-varying contexts,
intending to maximize cumulative proximal outcomes. However, unique challenges
such as modeling count outcomes within bandit frameworks have hindered the
widespread application of contextual bandits to mHealth studies. The current
work addresses this challenge by leveraging count data models into online
decision-making approaches. Specifically, we combine four common offline count
data models (Poisson, negative binomial, zero-inflated Poisson, and
zero-inflated negative binomial regressions) with Thompson sampling, a popular
contextual bandit algorithm. The proposed algorithms are motivated by and
evaluated on a real dataset from the Drink Less trial, where they are shown to
improve user engagement with the mHealth system. The proposed methods are
further evaluated on simulated data, achieving improvement in maximizing
cumulative proximal outcomes over existing algorithms. Theoretical results on
regret bounds are also derived. A user-friendly R package countts that
implements the proposed methods for assessing contextual bandit algorithms is
made publicly available at https://cran.r-project.org/web/packages/countts.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery [2.266258510757917]
This thesis aims to study some of the mathematical challenges that arise in the analysis of statistical sequential decision-making algorithms for postoperative patients follow-up.
We developed new safe, anytime-valid concentration bounds, introduced a new framework for risk-aware contextual bandits and analysed a novel class of nonparametric bandit algorithms under weak assumptions.
As a first step towards personalised postoperative follow-up recommendations, we developed with medical doctors and surgeons an interpretable machine learning model to predict the long-term weight trajectories of patients after bariatric surgery.
arXiv Detail & Related papers (2024-05-03T10:50:30Z) - Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation
for Time Series [49.992908221544624]
Time series data often exhibit numerous missing values, which is the time series imputation task.
Previous deep learning methods have been shown to be effective for time series imputation.
We propose a non-generative time series imputation method that produces accurate imputations with inherent uncertainty.
arXiv Detail & Related papers (2023-12-03T05:52:30Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via
pT-Learning [2.0625936401496237]
Recent advances in mobile health (mHealth) technology provide an effective way to monitor individuals' health statuses and deliver just-in-time personalized interventions.
The practical use of mHealth technology raises unique challenges to existing methodologies on learning an optimal dynamic treatment regime.
We propose a Proximal Temporal Learning framework to estimate an optimal regime adaptively adjusted between deterministic and sparse policy models.
arXiv Detail & Related papers (2021-10-20T18:38:22Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z) - Online Batch Decision-Making with High-Dimensional Covariates [20.06690325969748]
We propose and investigate a class of new algorithms for sequential decision making that interact with textita batch of users simultaneously instead of textita user at each decision epoch.
We deliver a solution, named textitTeamwork LASSO Bandit algorithm, that resolves a batch version of explore-exploit dilemma via switching between stage and selfish stage during the whole decision process.
arXiv Detail & Related papers (2020-02-21T17:36:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.