Policy Learning for Optimal Dynamic Treatment Regimes with Observational Data
- URL: http://arxiv.org/abs/2404.00221v7
- Date: Tue, 20 May 2025 09:50:20 GMT
- Title: Policy Learning for Optimal Dynamic Treatment Regimes with Observational Data
- Authors: Shosei Sakaguchi,
- Abstract summary: We study the statistical learning of optimal dynamic treatment regimes (DTRs) that determine the optimal treatment assignment for each individual at each stage based on their evolving history.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Public policies and medical interventions often involve dynamic treatment assignments, in which individuals receive a sequence of interventions over multiple stages. We study the statistical learning of optimal dynamic treatment regimes (DTRs) that determine the optimal treatment assignment for each individual at each stage based on their evolving history. We propose a novel, doubly robust, classification-based method for learning the optimal DTR from observational data under the sequential ignorability assumption. The method proceeds via backward induction: at each stage, it constructs and maximizes an augmented inverse probability weighting (AIPW) estimator of the policy value function to learn the optimal stage-specific policy. We show that the resulting DTR achieves an optimal convergence rate of $n^{-1/2}$ for welfare regret under mild convergence conditions on estimators of the nuisance components.
Related papers
- Parameterized Diffusion Optimization enabled Autoregressive Ordinal Regression for Diabetic Retinopathy Grading [53.11883409422728]
This work proposes a novel autoregressive ordinal regression method called AOR-DR.<n>We decompose the diabetic retinopathy grading task into a series of ordered steps by fusing the prediction of the previous steps with extracted image features.<n>We exploit the diffusion process to facilitate conditional probability modeling, enabling the direct use of continuous global image features for autoregression.
arXiv Detail & Related papers (2025-07-07T13:22:35Z) - Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model.<n>We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z) - Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms.
We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths.
We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Stage-Aware Learning for Dynamic Treatments [3.6923632650826486]
We propose a novel individualized learning method for dynamic treatment regimes.
By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of IPWE-based methods.
arXiv Detail & Related papers (2023-10-30T06:35:31Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Efficient and robust transfer learning of optimal individualized
treatment regimes with right-censored survival data [7.308241944759317]
An individualized treatment regime (ITR) is a decision rule that assigns treatments based on patients' characteristics.
We propose a doubly robust estimator of the value function, and the optimal ITR is learned by maximizing the value function within a pre-specified class of ITRs.
We evaluate the empirical performance of the proposed method by simulation studies and a real data application of sodium bicarbonate therapy for patients with severe metabolic acidaemia.
arXiv Detail & Related papers (2023-01-13T11:47:10Z) - When AUC meets DRO: Optimizing Partial AUC for Deep Learning with
Non-Convex Convergence Guarantee [51.527543027813344]
We propose systematic and efficient gradient-based methods for both one-way and two-way partial AUC (pAUC)
For both one-way and two-way pAUC, we propose two algorithms and prove their convergence for optimizing their two formulations, respectively.
arXiv Detail & Related papers (2022-03-01T01:59:53Z) - Policy Learning for Optimal Individualized Dose Intervals [3.9801611649762263]
We propose a new method to estimate such a policy.
We prove that our estimated policy is consistent, and its risk converges to that of the best-in-class policy at a root-nn rate.
arXiv Detail & Related papers (2022-02-24T17:59:20Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Estimation of Optimal Dynamic Treatment Assignment Rules under Policy Constraints [0.0]
We study estimation of an optimal dynamic treatment regime that guides the optimal treatment assignment for each individual at each stage based on their history.
The paper proposes two estimation methods: one solves the treatment assignment problem sequentially through backward induction, and the other solves the entire problem simultaneously across all stages.
arXiv Detail & Related papers (2021-06-09T12:42:53Z) - Stochastic Optimization of Areas Under Precision-Recall Curves with
Provable Convergence [66.83161885378192]
Area under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems.
We propose a technical method to optimize AUPRC for deep learning.
arXiv Detail & Related papers (2021-04-18T06:22:21Z) - Learning Individualized Treatment Rules with Estimated Translated
Inverse Propensity Score [29.606141542532356]
In this paper, we focus on learning individualized treatment rules (ITRs) to derive a treatment policy.
In our framework, we cast ITRs learning as a contextual bandit problem and minimize the expected risk of the treatment policy.
As a long-term goal, our derived policy might eventually lead to better clinical guidelines for the administration of IV and VP.
arXiv Detail & Related papers (2020-07-02T13:13:56Z) - DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret [59.81290762273153]
Dynamic treatment regimes (DTRs) are personalized, adaptive, multi-stage treatment plans that adapt treatment decisions to an individual's initial features and to intermediate outcomes and features at each subsequent stage.
We propose a novel algorithm that, by carefully balancing exploration and exploitation, is guaranteed to achieve rate-optimal regret when the transition and reward models are linear.
arXiv Detail & Related papers (2020-05-06T13:03:42Z) - Comment: Entropy Learning for Dynamic Treatment Regimes [58.442274475425144]
JSLZ's approach leverages a rejection-and-sampling estimate of the value of a given decision rule based on inverse probability (IPW) and its interpretation as a weighted (or cost-sensitive) classification.
Their use of smooth classification surrogates enables their careful approach to analyzing distributions.
The IPW estimate is problematic as it leads to weights that discard most of the data and are extremely variable on whatever remains.
arXiv Detail & Related papers (2020-04-06T16:11:05Z) - Multicategory Angle-based Learning for Estimating Optimal Dynamic
Treatment Regimes with Censored Data [12.499787110182632]
An optimal treatment regime (DTR) consists of a sequence of decision rules in maximizing long-term benefits.
In this paper, we develop a novel angle-based approach to target the optimal DTR under a multicategory treatment framework.
Our numerical studies show that the proposed method outperforms competing methods in terms of maximizing the conditional survival function.
arXiv Detail & Related papers (2020-01-14T05:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.