Related papers: OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment

OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment

URL: http://arxiv.org/abs/2409.13299v1
Date: Fri, 20 Sep 2024 07:51:37 GMT
Title: OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment
Authors: Yooseok Lim, Sujee Lee,
Abstract summary: This study focuses on developing a reward function that reflects the clinician's intentions. We learn a parameterized reward function that includes the expert's intentions from limited data. This approach can be broadly utilized not only for the heparin dosing problem but also for RL-based medication dosing tasks in general.
Score: 0.4998632546280975
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate diagnosis of individual patient conditions and appropriate medication dosing strategies are core elements of personalized medical decision-making processes. This therapeutic procedure, which entails recursively assessing the patient's condition and administering suitable medications, can effectively be modeled as a reinforcement learning (RL) problem. Crucially, the success of RL in this context depends on the establishment of a well-defined reward function that accurately represents the optimal treatment strategy. However, defining the learning direction in RL with only a limited set of explicit indicators complicates the task due to the inherent complexity of the required domain knowledge. This approach may also increase the likelihood that the RL policy does not adequately reflect the clinician's treatment intentions, which are determined by considering various situations and indicators. In this study, we focus on developing a reward function that reflects the clinician's intentions and introduce Offline Model-based Guided Reward Learning (OMG-RL), which performs offline inverse reinforcement learning (IRL) aligned with the offline RL environment. Through OMG-RL, we learn a parameterized reward function that includes the expert's intentions from limited data, thereby enhancing the agent's policy. We validate the proposed approach on the heparin dosing task. The results demonstrate that policy learning through OMG-RL is meaningful and confirm that the learned policy is positively reinforced in terms of activated partial thromboplastin time (aPTT), a key indicator for monitoring the effects of heparin. This approach can be broadly utilized not only for the heparin dosing problem but also for RL-based medication dosing tasks in general.

Related papers

Diffusion Guidance Is a Controllable Policy Improvement Operator [98.11511661904618]
CFGRL is trained with the simplicity of supervised learning, yet can further improve on the policies in the data.<n>On offline RL tasks, we observe a reliable trend -- increased guidance weighting leads to increased performance.
arXiv Detail & Related papers (2025-05-29T14:06:50Z)
Development and Validation of Heparin Dosing Policies Using an Offline Reinforcement Learning Algorithm [0.7519918949973486]
This study proposes a reinforcement learning-based personalized optimal heparin dosing policy. A batch-constrained policy was implemented to minimize out-of-distribution errors in an offline RL environment. This research enhances heparin administration practices and establishes a precedent for the development of sophisticated decision-support tools in medicine.
arXiv Detail & Related papers (2024-09-24T05:20:38Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
Large Language Model Distilling Medication Recommendation Model [58.94186280631342]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs) Our research aims to transform existing medication recommendation methodologies using LLMs. To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z)
Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents. We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents. We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z)
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care [46.2482873419289]
We introduce a deep Q-learning approach to obtain more reliable critical care policies. We evaluate our method in off-policy and offline settings using simulated environments and real health records from intensive care units.
arXiv Detail & Related papers (2023-06-13T18:02:57Z)
Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications [3.770564448216192]
We introduce a practical and theoretically grounded transition sampling approach to address action imbalance during offline RL training. We perform extensive experiments on two real-world tasks for diabetes and sepsis treatment optimization. Across a range of principled and clinically relevant metrics, we show that our proposed approach enables substantial improvements in expected health outcomes.
arXiv Detail & Related papers (2023-02-15T09:30:57Z)
Policy Gradient for Reinforcement Learning with General Utilities [50.65940899590487]
In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards. Many supervised and unsupervised RL problems are not covered in the Linear RL framework. We derive the policy gradient theorem for RL with general utilities.
arXiv Detail & Related papers (2022-10-03T14:57:46Z)
Reinforcement Learning For Survival, A Clinically Motivated Method For Critically Ill Patients [0.0]
We propose a clinically motivated control objective for critically ill patients, for which the value functions have a simple medical interpretation. We experiment on a large cohort and show that our method produces results consistent with clinical knowledge.
arXiv Detail & Related papers (2022-07-17T00:06:09Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Instabilities of Offline RL with Pre-Trained Neural Representation [127.89397629569808]
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated. Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold. This work studies these issues from an empirical perspective to gauge how stable offline RL methods are.
arXiv Detail & Related papers (2021-03-08T18:06:44Z)
Trajectory Inspection: A Method for Iterative Clinician-Driven Design of Reinforcement Learning Studies [5.5302127686575435]
We highlight a simple approach, trajectory inspection, to bring clinicians into an iterative design process for model-based RL studies. We identify where the model recommends unexpectedly aggressive treatments or expects surprisingly positive outcomes from its recommendations.
arXiv Detail & Related papers (2020-10-08T22:03:01Z)
Reinforcement learning and Bayesian data assimilation for model-informed precision dosing in oncology [0.0]
Current strategies comprise model-informed dosing tables or are based on maximum a-posteriori estimates. We propose three novel approaches for MIPD employing Bayesian data assimilation and/or reinforcement learning to control neutropenia. These approaches have the potential to substantially reduce the incidence of life-threatening grade 4 and subtherapeutic grade 0 neutropenia.
arXiv Detail & Related papers (2020-06-01T16:38:27Z)
MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. We present MOReL, an algorithmic framework for model-based offline RL. We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)
Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Hemodynamic Management in Sepsis Patients [25.71979754918741]
We perform a sensitivity analysis on a state-of-the-art RL algorithm applied to hemodynamic stabilization treatment strategies for septic patients in the ICU. We consider sensitivity of learned policies to input features, embedding model architecture, time discretization, reward function, and random seeds. We find that varying these settings can significantly impact learned policies, which suggests a need for caution when interpreting RL agent output.
arXiv Detail & Related papers (2020-05-08T22:08:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.