Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning
Approach to Critical Care
- URL: http://arxiv.org/abs/2306.08044v2
- Date: Thu, 13 Jul 2023 20:23:43 GMT
- Title: Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning
Approach to Critical Care
- Authors: Ali Shirali, Alexander Schubert, Ahmed Alaa
- Abstract summary: We introduce a deep Q-learning approach able to obtain more reliable critical care policies.
We achieve this by first pruning the action set based on all available rewards, and second training a final model based on the sparse main reward but with a restricted action set.
- Score: 68.8204255655161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most medical treatment decisions are sequential in nature. Hence, there is
substantial hope that reinforcement learning may make it possible to formulate
precise data-driven treatment plans. However, a key challenge for most
applications in this field is the sparse nature of primarily mortality-based
reward functions, leading to decreased stability of offline estimates. In this
work, we introduce a deep Q-learning approach able to obtain more reliable
critical care policies. This method integrates relevant but noisy intermediate
biomarker signals into the reward specification, without compromising the
optimization of the main outcome of interest (e.g. patient survival). We
achieve this by first pruning the action set based on all available rewards,
and second training a final model based on the sparse main reward but with a
restricted action set. By disentangling accurate and approximated rewards
through action pruning, potential distortions of the main objective are
minimized, all while enabling the extraction of valuable information from
intermediate signals that can guide the learning process. We evaluate our
method in both off-policy and offline settings using simulated environments and
real health records of patients in intensive care units. Our empirical results
indicate that pruning significantly reduces the size of the action space while
staying mostly consistent with the actions taken by physicians, outperforming
the current state-of-the-art offline reinforcement learning method conservative
Q-learning. Our work is a step towards developing reliable policies by
effectively harnessing the wealth of available information in data-intensive
critical care environments.
Related papers
- Policy Optimization for Personalized Interventions in Behavioral Health [8.10897203067601]
Behavioral health interventions, delivered through digital platforms, have the potential to significantly improve health outcomes.
We study the problem of optimizing personalized interventions for patients to maximize a long-term outcome.
We present a new approach for this problem that we dub DecompPI, which decomposes the state space for a system of patients to the individual level.
arXiv Detail & Related papers (2023-03-21T21:42:03Z) - Optimal discharge of patients from intensive care via a data-driven
policy learning framework [58.720142291102135]
It is important that the patient discharge task addresses the nuanced trade-off between decreasing a patient's length of stay and the risk of readmission or even death following the discharge decision.
This work introduces an end-to-end general framework for capturing this trade-off to recommend optimal discharge timing decisions.
A data-driven approach is used to derive a parsimonious, discrete state space representation that captures a patient's physiological condition.
arXiv Detail & Related papers (2021-12-17T04:39:33Z) - Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and
Dual Bounds [21.520045697447372]
Off-policy evaluation (OPE) is the task of estimating the expected reward of a given policy based on offline data previously collected under different policies.
This work considers the problem of constructing non-asymptotic confidence intervals in infinite-horizon off-policy evaluation.
We develop a practical algorithm through a primal-dual optimization-based approach.
arXiv Detail & Related papers (2021-03-09T22:31:20Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Semi-Supervised Off Policy Reinforcement Learning [3.48396189165489]
Health-outcome information is often not well coded but rather embedded in clinical notes.
We propose a semi-supervised learning (SSL) approach that efficiently leverages a small sized labeled data with true outcome observed, and a large unlabeled data with outcome surrogates.
Our method is at least as efficient as the supervised approach, and moreover safe as it robust to mis-specification of the imputation models.
arXiv Detail & Related papers (2020-12-09T00:59:12Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost
Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features.
Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets.
We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z) - Optimizing Medical Treatment for Sepsis in Intensive Care: from
Reinforcement Learning to Pre-Trial Evaluation [2.908482270923597]
Our aim is to establish a framework where reinforcement learning (RL) of optimizing interventions retrospectively allows us a regulatory compliant pathway to prospective clinical testing of the learned policies.
We focus on infections in intensive care units which are one of the major causes of death and difficult to treat because of the complex and opaque patient dynamics.
arXiv Detail & Related papers (2020-03-13T20:31:47Z) - Interpretable Off-Policy Evaluation in Reinforcement Learning by
Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education.
Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding.
We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.