Related papers: Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning

Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning

URL: http://arxiv.org/abs/2110.10719v1
Date: Wed, 20 Oct 2021 18:38:22 GMT
Title: Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
Authors: Wenzhuo Zhou, Ruoqing Zhu and Annie Qu
Abstract summary: Recent advances in mobile health (mHealth) technology provide an effective way to monitor individuals' health statuses and deliver just-in-time personalized interventions. The practical use of mHealth technology raises unique challenges to existing methodologies on learning an optimal dynamic treatment regime. We propose a Proximal Temporal Learning framework to estimate an optimal regime adaptively adjusted between deterministic and sparse policy models.
Score: 2.0625936401496237
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in mobile health (mHealth) technology provide an effective way to monitor individuals' health statuses and deliver just-in-time personalized interventions. However, the practical use of mHealth technology raises unique challenges to existing methodologies on learning an optimal dynamic treatment regime. Many mHealth applications involve decision-making with large numbers of intervention options and under an infinite time horizon setting where the number of decision stages diverges to infinity. In addition, temporary medication shortages may cause optimal treatments to be unavailable, while it is unclear what alternatives can be used. To address these challenges, we propose a Proximal Temporal consistency Learning (pT-Learning) framework to estimate an optimal regime that is adaptively adjusted between deterministic and stochastic sparse policy models. The resulting minimax estimator avoids the double sampling issue in the existing algorithms. It can be further simplified and can easily incorporate off-policy data without mismatched distribution corrections. We study theoretical properties of the sparse policy and establish finite-sample bounds on the excess risk and performance error. The proposed method is implemented by our proximalDTR package and is evaluated through extensive simulation studies and the OhioT1DM mHealth dataset.

Related papers

Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model. We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z)
Optimization-Driven Adaptive Experimentation [7.948144726705323]
Real-world experiments involve batched & delayed feedback, non-stationarity, multiple objectives & constraints, and (often some) personalization. Tailoring adaptive methods to address these challenges on a per-problem basis is infeasible, and static designs remain the de facto standard. We present a mathematical programming formulation that can flexibly incorporate a wide range of objectives, constraints, and statistical procedures.
arXiv Detail & Related papers (2024-08-08T16:29:09Z)
Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains. Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint. This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions. The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z)
Stage-Aware Learning for Dynamic Treatments [3.6923632650826486]
We propose a novel individualized learning method for dynamic treatment regimes. By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of IPWE-based methods.
arXiv Detail & Related papers (2023-10-30T06:35:31Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
Policy Optimization for Personalized Interventions in Behavioral Health [8.10897203067601]
Behavioral health interventions, delivered through digital platforms, have the potential to significantly improve health outcomes. We study the problem of optimizing personalized interventions for patients to maximize a long-term outcome. We present a new approach for this problem that we dub DecompPI, which decomposes the state space for a system of patients to the individual level.
arXiv Detail & Related papers (2023-03-21T21:42:03Z)
Quasi-optimal Reinforcement Learning with Continuous Actions [8.17049210746654]
We develop a novel emphquasi-optimal learning algorithm, which can be easily optimized in off-policy settings. We evaluate our algorithm with comprehensive simulated experiments and a dose suggestion real application to Ohio Type 1 diabetes dataset.
arXiv Detail & Related papers (2023-01-21T11:30:13Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare. Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions. We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z)
Federated Offline Reinforcement Learning [55.326673977320574]
We propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites. We design the first federated policy optimization algorithm for offline RL with sample complexity. We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed.
arXiv Detail & Related papers (2022-06-11T18:03:26Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Multicategory Angle-based Learning for Estimating Optimal Dynamic Treatment Regimes with Censored Data [12.499787110182632]
An optimal treatment regime (DTR) consists of a sequence of decision rules in maximizing long-term benefits. In this paper, we develop a novel angle-based approach to target the optimal DTR under a multicategory treatment framework. Our numerical studies show that the proposed method outperforms competing methods in terms of maximizing the conditional survival function.
arXiv Detail & Related papers (2020-01-14T05:19:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.