Semi-Markov Offline Reinforcement Learning for Healthcare
- URL: http://arxiv.org/abs/2203.09365v2
- Date: Mon, 21 Mar 2022 02:52:30 GMT
- Title: Semi-Markov Offline Reinforcement Learning for Healthcare
- Authors: Mehdi Fatemi and Mary Wu and Jeremy Petch and Walter Nelson and Stuart
J. Connolly and Alexander Benz and Anthony Carnicelli and Marzyeh Ghassemi
- Abstract summary: We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ.
We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments.
We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
- Score: 57.15307499843254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) tasks are typically framed as Markov Decision
Processes (MDPs), assuming that decisions are made at fixed time intervals.
However, many applications of great importance, including healthcare, do not
satisfy this assumption, yet they are commonly modelled as MDPs after an
artificial reshaping of the data. In addition, most healthcare (and similar)
problems are offline by nature, allowing for only retrospective studies. To
address both challenges, we begin by discussing the Semi-MDP (SMDP) framework,
which formally handles actions of variable timings. We next present a formal
way to apply SMDP modifications to nearly any given value-based offline RL
method. We use this theory to introduce three SMDP-based offline RL algorithms,
namely, SDQN, SDDQN, and SBCQ. We then experimentally demonstrate that only
these SMDP-based algorithms learn the optimal policy in variable-time
environments, whereas their MDP counterparts do not. Finally, we apply our new
algorithms to a real-world offline dataset pertaining to warfarin dosing for
stroke prevention and demonstrate similar results.
Related papers
- Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs)
Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally.
Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - Reinforcement Learning in the Wild with Maximum Likelihood-based Model
Transfer [5.92353064090273]
We study the problem of transferring the available Markov Decision Process (MDP) models to learn and plan efficiently in an unknown but similar MDP.
We propose a generic two-stage algorithm, MLEMTRL, to address the MTRL problem in discrete and continuous settings.
We empirically demonstrate that MLEMTRL allows faster learning in new MDPs than learning from scratch and achieves near-optimal performance.
arXiv Detail & Related papers (2023-02-18T09:47:34Z) - The Impact of Task Underspecification in Evaluating Deep Reinforcement
Learning [1.4711121887106535]
Evaluations of Deep Reinforcement Learning (DRL) methods are an integral part of scientific progress of the field.
In this article, we augment DRL evaluations to consider parameterized families of MDPs.
We show that evaluating the MDP family often yields a substantially different relative ranking of methods, casting doubt on what methods should be considered state-of-the-art.
arXiv Detail & Related papers (2022-10-16T18:51:55Z) - Twice regularized MDPs and the equivalence between robustness and
regularization [65.58188361659073]
We show that policy iteration on reward-robust MDPs can have the same time complexity as on regularized MDPs.
We generalize regularized MDPs to twice regularized MDPs.
arXiv Detail & Related papers (2021-10-12T18:33:45Z) - Safe Exploration by Solving Early Terminated MDP [77.10563395197045]
We introduce a new approach to address safe RL problems under the framework of Early TerminatedP (ET-MDP)
We first define the ET-MDP as an unconstrained algorithm with the same optimal value function as its corresponding CMDP.
An off-policy algorithm based on context models is then proposed to solve the ET-MDP, which thereby solves the corresponding CMDP with better performance and improved learning efficiency.
arXiv Detail & Related papers (2021-07-09T04:24:40Z) - Sample Efficient Reinforcement Learning In Continuous State Spaces: A
Perspective Beyond Linearity [50.38337893712897]
We introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions.
We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition.
We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.
arXiv Detail & Related papers (2021-06-15T00:06:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.