Deep Offline Reinforcement Learning for Real-world Treatment
Optimization Applications
- URL: http://arxiv.org/abs/2302.07549v2
- Date: Tue, 13 Jun 2023 12:24:32 GMT
- Title: Deep Offline Reinforcement Learning for Real-world Treatment
Optimization Applications
- Authors: Milashini Nambiar and Supriyo Ghosh and Priscilla Ong and Yu En Chan
and Yong Mong Bee and Pavitra Krishnaswamy
- Abstract summary: We introduce a practical and theoretically grounded transition sampling approach to address action imbalance during offline RL training.
We perform extensive experiments on two real-world tasks for diabetes and sepsis treatment optimization.
Across a range of principled and clinically relevant metrics, we show that our proposed approach enables substantial improvements in expected health outcomes.
- Score: 3.770564448216192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is increasing interest in data-driven approaches for recommending
optimal treatment strategies in many chronic disease management and critical
care applications. Reinforcement learning methods are well-suited to this
sequential decision-making problem, but must be trained and evaluated
exclusively on retrospective medical record datasets as direct online
exploration is unsafe and infeasible. Despite this requirement, the vast
majority of treatment optimization studies use off-policy RL methods (e.g.,
Double Deep Q Networks (DDQN) or its variants) that are known to perform poorly
in purely offline settings. Recent advances in offline RL, such as Conservative
Q-Learning (CQL), offer a suitable alternative. But there remain challenges in
adapting these approaches to real-world applications where suboptimal examples
dominate the retrospective dataset and strict safety constraints need to be
satisfied. In this work, we introduce a practical and theoretically grounded
transition sampling approach to address action imbalance during offline RL
training. We perform extensive experiments on two real-world tasks for diabetes
and sepsis treatment optimization to compare performance of the proposed
approach against prominent off-policy and offline RL baselines (DDQN and CQL).
Across a range of principled and clinically relevant metrics, we show that our
proposed approach enables substantial improvements in expected health outcomes
and in accordance with relevant practice and safety guidelines.
Related papers
- Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care [46.2482873419289]
We introduce a deep Q-learning approach to obtain more reliable critical care policies.
We evaluate our method in off-policy and offline settings using simulated environments and real health records from intensive care units.
arXiv Detail & Related papers (2023-06-13T18:02:57Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Offline RL Policies Should be Trained to be Adaptive [89.8580376798065]
We show that acting optimally in offline RL in a Bayesian sense involves solving an implicit POMDP.
As a result, optimal policies for offline RL must be adaptive, depending not just on the current state but rather all the transitions seen so far during evaluation.
We present a model-free algorithm for approximating this optimal adaptive policy, and demonstrate the efficacy of learning such adaptive policies in offline RL benchmarks.
arXiv Detail & Related papers (2022-07-05T17:58:33Z) - Federated Offline Reinforcement Learning [55.326673977320574]
We propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites.
We design the first federated policy optimization algorithm for offline RL with sample complexity.
We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed.
arXiv Detail & Related papers (2022-06-11T18:03:26Z) - A Conservative Q-Learning approach for handling distribution shift in
sepsis treatment strategies [0.0]
There is no consensus on what interventions work best and different patients respond very differently to the same treatment.
Deep Reinforcement Learning methods can be used to come up with optimal policies for treatment strategies mirroring physician actions.
The policy learned could help clinicians in Intensive Care Units to make better decisions while treating septic patients and improve survival rate.
arXiv Detail & Related papers (2022-03-25T19:50:18Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - Uncertainty-Based Offline Reinforcement Learning with Diversified
Q-Ensemble [16.92791301062903]
We propose an uncertainty-based offline RL method that takes into account the confidence of the Q-value prediction and does not require any estimation or sampling of the data distribution.
Surprisingly, we find that it is possible to substantially outperform existing offline RL methods on various tasks by simply increasing the number of Q-networks along with the clipped Q-learning.
arXiv Detail & Related papers (2021-10-04T16:40:13Z) - Model Selection for Offline Reinforcement Learning: Practical
Considerations for Healthcare Settings [13.376364233897528]
Reinforcement learning can be used to learn treatment policies and aid decision making in healthcare.
A standard validation pipeline for model selection requires running a learned policy in the actual environment.
Our work serves as a practical guide for offline RL model selection and can help RL practitioners select policies using real-world datasets.
arXiv Detail & Related papers (2021-07-23T02:41:51Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.