DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret
- URL: http://arxiv.org/abs/2005.02791v3
- Date: Tue, 20 Sep 2022 20:59:54 GMT
- Title: DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret
- Authors: Yichun Hu and Nathan Kallus
- Abstract summary: Dynamic treatment regimes (DTRs) are personalized, adaptive, multi-stage treatment plans that adapt treatment decisions to an individual's initial features and to intermediate outcomes and features at each subsequent stage.
We propose a novel algorithm that, by carefully balancing exploration and exploitation, is guaranteed to achieve rate-optimal regret when the transition and reward models are linear.
- Score: 59.81290762273153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dynamic treatment regimes (DTRs) are personalized, adaptive, multi-stage
treatment plans that adapt treatment decisions both to an individual's initial
features and to intermediate outcomes and features at each subsequent stage,
which are affected by decisions in prior stages. Examples include personalized
first- and second-line treatments of chronic conditions like diabetes, cancer,
and depression, which adapt to patient response to first-line treatment,
disease progression, and individual characteristics. While existing literature
mostly focuses on estimating the optimal DTR from offline data such as from
sequentially randomized trials, we study the problem of developing the optimal
DTR in an online manner, where the interaction with each individual affect both
our cumulative reward and our data collection for future learning. We term this
the DTR bandit problem. We propose a novel algorithm that, by carefully
balancing exploration and exploitation, is guaranteed to achieve rate-optimal
regret when the transition and reward models are linear. We demonstrate our
algorithm and its benefits both in synthetic experiments and in a case study of
adaptive treatment of major depressive disorder using real-world data.
Related papers
- Robust Learning for Optimal Dynamic Treatment Regimes with Observational Data [0.0]
We study statistical learning of optimal dynamic treatment regimes (DTRs) that guide the optimal treatment assignment for each individual at each stage based on the individual's history.
We propose a step-wise doubly-robust approach to learn the optimal DTR using observational data under the assumption of sequential ignorability.
arXiv Detail & Related papers (2024-03-30T02:33:39Z) - TCFimt: Temporal Counterfactual Forecasting from Individual Multiple
Treatment Perspective [50.675845725806724]
We propose a comprehensive framework of temporal counterfactual forecasting from an individual multiple treatment perspective (TCFimt)
TCFimt constructs adversarial tasks in a seq2seq framework to alleviate selection and time-varying bias and designs a contrastive learning-based block to decouple a mixed treatment effect into separated main treatment effects and causal interactions.
The proposed method shows satisfactory performance in predicting future outcomes with specific treatments and in choosing optimal treatment type and timing than state-of-the-art methods.
arXiv Detail & Related papers (2022-12-17T15:01:05Z) - Federated Offline Reinforcement Learning [55.326673977320574]
We propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites.
We design the first federated policy optimization algorithm for offline RL with sample complexity.
We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed.
arXiv Detail & Related papers (2022-06-11T18:03:26Z) - Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in
Medicine [20.401805132360654]
We develop two novel methods for learning optimal dynamic treatment regimes (DTRs)
Our methods are based on a data-driven estimation of heterogeneous treatment effects using causal tree methods.
We evaluate our proposed methods using synthetic data and then apply them to real-world data from intensive care units.
arXiv Detail & Related papers (2022-04-14T17:27:08Z) - Ambiguous Dynamic Treatment Regimes: A Reinforcement Learning Approach [0.0]
Dynamic Treatment Regimes (DTRs) are widely studied to formalize this process.
We develop Reinforcement Learning methods to efficiently learn optimal treatment regimes.
arXiv Detail & Related papers (2021-12-08T20:22:04Z) - Disentangled Counterfactual Recurrent Networks for Treatment Effect
Inference over Time [71.30985926640659]
We introduce the Disentangled Counterfactual Recurrent Network (DCRN), a sequence-to-sequence architecture that estimates treatment outcomes over time.
With an architecture that is completely inspired by the causal structure of treatment influence over time, we advance forecast accuracy and disease understanding.
We demonstrate that DCRN outperforms current state-of-the-art methods in forecasting treatment responses, on both real and simulated data.
arXiv Detail & Related papers (2021-12-07T16:40:28Z) - Continuous Treatment Recommendation with Deep Survival Dose Response
Function [3.705291460388999]
We propose a general formulation for continuous treatment recommendation problems in settings with clinical survival data.
The estimated treatment effect from DeepSDRF enables us to develop recommender algorithms with the correction for selection bias.
This is the first time that causal models are used to address the continuous treatment effect with observational data in a medical context.
arXiv Detail & Related papers (2021-08-24T00:19:04Z) - DeepRite: Deep Recurrent Inverse TreatmEnt Weighting for Adjusting
Time-varying Confounding in Modern Longitudinal Observational Data [68.29870617697532]
We propose Deep Recurrent Inverse TreatmEnt weighting (DeepRite) for time-varying confounding in longitudinal data.
DeepRite is shown to recover the ground truth from synthetic data, and estimate unbiased treatment effects from real data.
arXiv Detail & Related papers (2020-10-28T15:05:08Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Multicategory Angle-based Learning for Estimating Optimal Dynamic
Treatment Regimes with Censored Data [12.499787110182632]
An optimal treatment regime (DTR) consists of a sequence of decision rules in maximizing long-term benefits.
In this paper, we develop a novel angle-based approach to target the optimal DTR under a multicategory treatment framework.
Our numerical studies show that the proposed method outperforms competing methods in terms of maximizing the conditional survival function.
arXiv Detail & Related papers (2020-01-14T05:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.