A Penalized Shared-parameter Algorithm for Estimating Optimal Dynamic
Treatment Regimens
- URL: http://arxiv.org/abs/2107.07875v1
- Date: Tue, 13 Jul 2021 05:31:14 GMT
- Title: A Penalized Shared-parameter Algorithm for Estimating Optimal Dynamic
Treatment Regimens
- Authors: Trikay Nalamada, Shruti Agarwal, Maria Jahja, Bibhas Chakraborty and
Palash Ghosh
- Abstract summary: We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-learning setup.
We give evidence for the proposed method in a real-world application and several synthetic simulations.
- Score: 3.9023554886892438
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A dynamic treatment regimen (DTR) is a set of decision rules to personalize
treatments for an individual using their medical history. The Q-learning based
Q-shared algorithm has been used to develop DTRs that involve decision rules
shared across multiple stages of intervention. We show that the existing
Q-shared algorithm can suffer from non-convergence due to the use of linear
models in the Q-learning setup, and identify the condition in which Q-shared
fails. Leveraging properties from expansion-constrained ordinary least-squares,
we give a penalized Q-shared algorithm that not only converges in settings that
violate the condition, but can outperform the original Q-shared algorithm even
when the condition is satisfied. We give evidence for the proposed method in a
real-world application and several synthetic simulations.
Related papers
- Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity [3.4376560669160394]
We introduce and analyze a novel model-free algorithm called Variance-Reduced Cascade Q-learning (VRCQ)
VRCQ provides superior guarantees in the $ell_infty$-norm compared with the existing model-free approximation-type algorithms.
arXiv Detail & Related papers (2024-08-13T00:34:33Z) - Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling.
Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - A Data-Driven State Aggregation Approach for Dynamic Discrete Choice
Models [7.7347261505610865]
We present a novel algorithm that provides a data-driven method for selecting and aggregating states.
The proposed two-stage approach mitigates the curse of dimensionality by reducing the problem dimension.
We demonstrate the empirical performance of the algorithm in two classic dynamic discrete choice estimation applications.
arXiv Detail & Related papers (2023-04-11T01:07:24Z) - Differentially Private Deep Q-Learning for Pattern Privacy Preservation
in MEC Offloading [76.0572817182483]
attackers may eavesdrop on the offloading decisions to infer the edge server's (ES's) queue information and users' usage patterns.
We propose an offloading strategy which jointly minimizes the latency, ES's energy consumption, and task dropping rate, while preserving pattern privacy (PP)
We develop a Differential Privacy Deep Q-learning based Offloading (DP-DQO) algorithm to solve this problem while addressing the PP issue by injecting noise into the generated offloading decisions.
arXiv Detail & Related papers (2023-02-09T12:50:18Z) - Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time
Systems with Lipschitz Continuous Controls [2.922007656878633]
We propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls.
A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics.
arXiv Detail & Related papers (2020-10-27T06:11:04Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Boosting Algorithms for Estimating Optimal Individualized Treatment
Rules [4.898659895355356]
We present nonparametric algorithms for estimating optimal individualized treatment rules.
The proposed algorithms are based on the XGBoost algorithm, which is known as one of the most powerful algorithms in the machine learning literature.
arXiv Detail & Related papers (2020-01-31T22:26:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.