Guardian-regularized Safe Offline Reinforcement Learning for Smart Weaning of Mechanical Circulatory Devices
- URL: http://arxiv.org/abs/2511.06111v1
- Date: Sat, 08 Nov 2025 19:32:31 GMT
- Title: Guardian-regularized Safe Offline Reinforcement Learning for Smart Weaning of Mechanical Circulatory Devices
- Authors: Aysin Tumay, Sophia Sun, Sonia Fereidooni, Aaron Dumas, Elise Jortberg, Rose Yu,
- Abstract summary: We study the sequential decision-making problem for automated weaning of mechanical circulatory support devices in cardiogenic shock patients.<n> offline reinforcement learning has proven to be successful in sequential decision-making tasks.<n>We develop an end-to-end machine learning framework with two key contributions.
- Score: 19.512275639322638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the sequential decision-making problem for automated weaning of mechanical circulatory support (MCS) devices in cardiogenic shock patients. MCS devices are percutaneous micro-axial flow pumps that provide left ventricular unloading and forward blood flow, but current weaning strategies vary significantly across care teams and lack data-driven approaches. Offline reinforcement learning (RL) has proven to be successful in sequential decision-making tasks, but our setting presents challenges for training and evaluating traditional offline RL methods: prohibition of online patient interaction, highly uncertain circulatory dynamics due to concurrent treatments, and limited data availability. We developed an end-to-end machine learning framework with two key contributions (1) Clinically-aware OOD-regularized Model-based Policy Optimization (CORMPO), a density-regularized offline RL algorithm for out-of-distribution suppression that also incorporates clinically-informed reward shaping and (2) a Transformer-based probabilistic digital twin that models MCS circulatory dynamics for policy evaluation with rich physiological and clinical metrics. We prove that \textsf{CORMPO} achieves theoretical performance guarantees under mild assumptions. CORMPO attains a higher reward than the offline RL baselines by 28% and higher scores in clinical metrics by 82.6% on real and synthetic datasets. Our approach offers a principled framework for safe offline policy learning in high-stakes medical applications where domain expertise and safety constraints are essential.
Related papers
- Attention-Based Offline Reinforcement Learning and Clustering for Interpretable Sepsis Treatment [0.7209528581296429]
A clustering-based stratification module categorizes patients into low, intermediate, and high-risk groups upon ICU admission.<n>A synthetic data augmentation pipeline leveraging variational autoencoders (VAE) and diffusion models enrich underrepresented trajectories such as fluid or vasopressor administration.<n>A rationale generation module powered by a multi-modal large language model produces natural-language justifications grounded in clinical context.
arXiv Detail & Related papers (2026-01-20T18:41:44Z) - Beyond Prediction: Reinforcement Learning as the Defining Leap in Healthcare AI [38.11241251343041]
Reinforcement learning (RL) marks a fundamental shift in how artificial intelligence is applied in healthcare.<n>Instead of merely predicting outcomes, RL actively decides interventions with long term goals.<n>This paper explores RL's rise in healthcare as more than a set of tools, rather a shift toward agentive intelligence in clinical environments.
arXiv Detail & Related papers (2025-08-28T07:05:24Z) - Distribution-Free Uncertainty Quantification in Mechanical Ventilation Treatment: A Conformal Deep Q-Learning Framework [2.5070297884580874]
This study introduces ConformalDQN, a distribution-free conformal deep Q-learning approach for optimizing mechanical ventilation in intensive care units.<n>We trained and evaluated our model using ICU patient records from the MIMIC-IV database.
arXiv Detail & Related papers (2024-12-17T06:55:20Z) - Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care [46.2482873419289]
We introduce a deep Q-learning approach to obtain more reliable critical care policies.
We evaluate our method in off-policy and offline settings using simulated environments and real health records from intensive care units.
arXiv Detail & Related papers (2023-06-13T18:02:57Z) - Deep Offline Reinforcement Learning for Real-world Treatment
Optimization Applications [3.770564448216192]
We introduce a practical and theoretically grounded transition sampling approach to address action imbalance during offline RL training.
We perform extensive experiments on two real-world tasks for diabetes and sepsis treatment optimization.
Across a range of principled and clinically relevant metrics, we show that our proposed approach enables substantial improvements in expected health outcomes.
arXiv Detail & Related papers (2023-02-15T09:30:57Z) - Federated Offline Reinforcement Learning [55.326673977320574]
We propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites.
We design the first federated policy optimization algorithm for offline RL with sample complexity.
We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed.
arXiv Detail & Related papers (2022-06-11T18:03:26Z) - Auto-FedRL: Federated Hyperparameter Optimization for
Multi-institutional Medical Image Segmentation [48.821062916381685]
Federated learning (FL) is a distributed machine learning technique that enables collaborative model training while avoiding explicit data sharing.
In this work, we propose an efficient reinforcement learning(RL)-based federated hyperparameter optimization algorithm, termed Auto-FedRL.
The effectiveness of the proposed method is validated on a heterogeneous data split of the CIFAR-10 dataset and two real-world medical image segmentation datasets.
arXiv Detail & Related papers (2022-03-12T04:11:42Z) - Continual Competitive Memory: A Neural System for Online Task-Free
Lifelong Learning [91.3755431537592]
We propose a novel form of unsupervised learning, continual competitive memory ( CCM)
The resulting neural system is shown to offer an effective approach for combating catastrophic forgetting in online continual classification problems.
We demonstrate that the proposed CCM system not only outperforms other competitive learning neural models but also yields performance that is competitive with several modern, state-of-the-art lifelong learning approaches.
arXiv Detail & Related papers (2021-06-24T20:12:17Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost
Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features.
Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets.
We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.