Related papers: Policy Learning for Optimal Individualized Dose Intervals

Policy Learning for Optimal Individualized Dose Intervals

URL: http://arxiv.org/abs/2202.12234v1
Date: Thu, 24 Feb 2022 17:59:20 GMT
Title: Policy Learning for Optimal Individualized Dose Intervals
Authors: Guanhua Chen, Xiaomao Li, Menggang Yu
Abstract summary: We propose a new method to estimate such a policy. We prove that our estimated policy is consistent, and its risk converges to that of the best-in-class policy at a root-nn rate.
Score: 3.9801611649762263
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the problem of learning individualized dose intervals using observational data. There are very few previous works for policy learning with continuous treatment, and all of them focused on recommending an optimal dose rather than an optimal dose interval. In this paper, we propose a new method to estimate such an optimal dose interval, named probability dose interval (PDI). The potential outcomes for doses in the PDI are guaranteed better than a pre-specified threshold with a given probability (e.g., 50%). The associated nonconvex optimization problem can be efficiently solved by the Difference-of-Convex functions (DC) algorithm. We prove that our estimated policy is consistent, and its risk converges to that of the best-in-class policy at a root-n rate. Numerical simulations show the advantage of the proposed method over outcome modeling based benchmarks. We further demonstrate the performance of our method in determining individualized Hemoglobin A1c (HbA1c) control intervals for elderly patients with diabetes.

Related papers

Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data [3.6714630660726586]
offline reinforcement learning (RL) aims to find optimal policies in dynamic environments in order to maximize the expected total rewards by leveraging pre-collected data.<n>Traditional methods focus on learning an optimal policy for all individuals with pre-collected data from a single episode or homogeneous batch episodes.<n>We propose an individualized offline policy optimization framework for heterogeneous time-stationary Markov decision processes.
arXiv Detail & Related papers (2025-05-14T15:44:10Z)
Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model. We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z)
Learning Robust Treatment Rules for Censored Data [14.95510487866686]
We propose two criteria for estimating optimal treatment rules. We show improved performance compared to existing methods. We also demonstrate the proposed method using AIDS clinical data.
arXiv Detail & Related papers (2024-08-17T09:58:58Z)
Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples. However, IS is employed in RL as a passive tool for re-weighting historical samples. We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z)
Robust Learning for Optimal Dynamic Treatment Regimes with Observational Data [0.0]
We study the statistical learning of optimal dynamic treatment regimes (DTRs) that guide the optimal treatment assignment for each individual at each stage based on the individual's evolving history.
arXiv Detail & Related papers (2024-03-30T02:33:39Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Reliable Off-Policy Learning for Dosage Combinations [27.385663284378854]
Decision-making in personalized medicine must often make choices for dosage combinations. We propose a novel method for reliable off-policy learning for dosage combinations.
arXiv Detail & Related papers (2023-05-31T11:08:43Z)
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded. We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z)
Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity. We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class. We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z)
CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning. We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z)
Kernel Assisted Learning for Personalized Dose Finding [20.52632915107782]
An individualized dose rule recommends a dose level within a continuous safe dose range based on patient level information. In this article, we propose a kernel assisted learning method for estimating the optimal individualized dose rule.
arXiv Detail & Related papers (2020-07-19T23:03:26Z)
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
Multicategory Angle-based Learning for Estimating Optimal Dynamic Treatment Regimes with Censored Data [12.499787110182632]
An optimal treatment regime (DTR) consists of a sequence of decision rules in maximizing long-term benefits. In this paper, we develop a novel angle-based approach to target the optimal DTR under a multicategory treatment framework. Our numerical studies show that the proposed method outperforms competing methods in terms of maximizing the conditional survival function.
arXiv Detail & Related papers (2020-01-14T05:19:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.