Policy Learning for Optimal Individualized Dose Intervals
- URL: http://arxiv.org/abs/2202.12234v1
- Date: Thu, 24 Feb 2022 17:59:20 GMT
- Title: Policy Learning for Optimal Individualized Dose Intervals
- Authors: Guanhua Chen, Xiaomao Li, Menggang Yu
- Abstract summary: We propose a new method to estimate such a policy.
We prove that our estimated policy is consistent, and its risk converges to that of the best-in-class policy at a root-nn rate.
- Score: 3.9801611649762263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of learning individualized dose intervals using
observational data. There are very few previous works for policy learning with
continuous treatment, and all of them focused on recommending an optimal dose
rather than an optimal dose interval. In this paper, we propose a new method to
estimate such an optimal dose interval, named probability dose interval (PDI).
The potential outcomes for doses in the PDI are guaranteed better than a
pre-specified threshold with a given probability (e.g., 50%). The associated
nonconvex optimization problem can be efficiently solved by the
Difference-of-Convex functions (DC) algorithm. We prove that our estimated
policy is consistent, and its risk converges to that of the best-in-class
policy at a root-n rate. Numerical simulations show the advantage of the
proposed method over outcome modeling based benchmarks. We further demonstrate
the performance of our method in determining individualized Hemoglobin A1c
(HbA1c) control intervals for elderly patients with diabetes.
Related papers
- Learning Robust Treatment Rules for Censored Data [14.95510487866686]
We propose two criteria for estimating optimal treatment rules.
We show improved performance compared to existing methods.
We also demonstrate the proposed method using AIDS clinical data.
arXiv Detail & Related papers (2024-08-17T09:58:58Z) - Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples.
However, IS is employed in RL as a passive tool for re-weighting historical samples.
We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z) - Robust Learning for Optimal Dynamic Treatment Regimes with Observational Data [0.0]
We study statistical learning of optimal dynamic treatment regimes (DTRs) that guide the optimal treatment assignment for each individual at each stage based on the individual's history.
We propose a step-wise doubly-robust approach to learn the optimal DTR using observational data under the assumption of sequential ignorability.
arXiv Detail & Related papers (2024-03-30T02:33:39Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Reliable Off-Policy Learning for Dosage Combinations [27.385663284378854]
Decision-making in personalized medicine must often make choices for dosage combinations.
We propose a novel method for reliable off-policy learning for dosage combinations.
arXiv Detail & Related papers (2023-05-31T11:08:43Z) - Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule.
Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded.
We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z) - Kernel Assisted Learning for Personalized Dose Finding [20.52632915107782]
An individualized dose rule recommends a dose level within a continuous safe dose range based on patient level information.
In this article, we propose a kernel assisted learning method for estimating the optimal individualized dose rule.
arXiv Detail & Related papers (2020-07-19T23:03:26Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z) - Multicategory Angle-based Learning for Estimating Optimal Dynamic
Treatment Regimes with Censored Data [12.499787110182632]
An optimal treatment regime (DTR) consists of a sequence of decision rules in maximizing long-term benefits.
In this paper, we develop a novel angle-based approach to target the optimal DTR under a multicategory treatment framework.
Our numerical studies show that the proposed method outperforms competing methods in terms of maximizing the conditional survival function.
arXiv Detail & Related papers (2020-01-14T05:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.