Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care
- URL: http://arxiv.org/abs/2512.08012v1
- Date: Mon, 08 Dec 2025 20:09:15 GMT
- Title: Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care
- Authors: Aryaman Bansal, Divya Sharma,
- Abstract summary: In critical care settings, clinicians face the challenge of balancing conflicting objectives, primarily maximizing patient survival while minimizing resource utilization.<n>Single-objective Reinforcement Learning approaches typically address this by optimizing a fixed scalarized reward function, resulting in rigid policies that fail to adapt to varying clinical priorities.<n>This paper benchmarks three offline MORL algorithms against three single-objective baselines on the MIMIC-IV dataset.
- Score: 0.07161783472741748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In critical care settings such as the Intensive Care Unit, clinicians face the complex challenge of balancing conflicting objectives, primarily maximizing patient survival while minimizing resource utilization (e.g., length of stay). Single-objective Reinforcement Learning approaches typically address this by optimizing a fixed scalarized reward function, resulting in rigid policies that fail to adapt to varying clinical priorities. Multi-objective Reinforcement Learning (MORL) offers a solution by learning a set of optimal policies along the Pareto Frontier, allowing for dynamic preference selection at test time. However, applying MORL in healthcare necessitates strict offline learning from historical data. In this paper, we benchmark three offline MORL algorithms, Conditioned Conservative Pareto Q-Learning (CPQL), Adaptive CPQL, and a modified Pareto Efficient Decision Agent (PEDA) Decision Transformer (PEDA DT), against three scalarized single-objective baselines (BC, CQL, and DDQN) on the MIMIC-IV dataset. Using Off-Policy Evaluation (OPE) metrics, we demonstrate that PEDA DT algorithm offers superior flexibility compared to static scalarized baselines. Notably, our results extend previous findings on single-objective Decision Transformers in healthcare, confirming that sequence modeling architectures remain robust and effective when scaled to multi-objective conditioned generation. These findings suggest that offline MORL is a promising framework for enabling personalized, adjustable decision-making in critical care without the need for retraining.
Related papers
- From Robotics to Sepsis Treatment: Offline RL via Geometric Pessimism [0.0]
Methods like CQL offer rigorous conservatism but require tremendous compute power.<n>IQL often fail to correct OOD errors on pathological datasets, collapsing to Behavioural Cloning.<n>We propose Geometric Pessimism, a compute-efficient framework that augments standard IQL with density-based penalty.
arXiv Detail & Related papers (2026-02-09T13:48:25Z) - Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z) - A Closed-Loop Personalized Learning Agent Integrating Neural Cognitive Diagnosis, Bounded-Ability Adaptive Testing, and LLM-Driven Feedback [5.190121417265426]
This paper presents an end-to-end personalized learning agent, which integrates a Neural Cognitive Diagnosis model (NCD), a Bounded-Ability Computerized Adaptive Testing strategy (BECAT), and large language models (LLMs)<n> Experiments on the ASSISTments dataset show that the NCD module achieves strong performance on response prediction while yielding interpretable mastery assessments.<n>Overall, the results indicate that the proposed design is effective and practically deployable.
arXiv Detail & Related papers (2025-10-26T07:32:31Z) - Reinforcement Learning enhanced Online Adaptive Clinical Decision Support via Digital Twin powered Policy and Treatment Effect optimized Reward [3.3025649517524793]
We present an online adaptive tool where reinforcement learning provides the policy, a patient digital twin provides the environment, and treatment effect defines the reward.<n> Experiments in a synthetic clinical simulator show low latency, stable throughput, a low expert query rate at fixed safety, and improved return against standard value-based baselines.
arXiv Detail & Related papers (2025-08-24T04:51:22Z) - Test-time Offline Reinforcement Learning on Goal-related Experience [50.94457794664909]
Research in foundation models has shown that performance can be substantially improved through test-time training.<n>We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state.<n>Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out.
arXiv Detail & Related papers (2025-07-24T21:11:39Z) - medDreamer: Model-Based Reinforcement Learning with Latent Imagination on Complex EHRs for Clinical Decision Support [3.8382507197481144]
medDreamer is a novel model-based reinforcement learning framework for personalized treatment recommendation.<n>It simulates latent patient states from irregular data and a two-phase policy trained on a hybrid of real and imagined trajectories.<n>It significantly outperforms model-free and model-based baselines in both clinical outcomes and off-policy metrics.
arXiv Detail & Related papers (2025-05-26T10:16:39Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [51.00436121587591]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.<n>We focus on the case of linear utility functions parametrised by weight vectors w.<n>We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care [46.2482873419289]
We introduce a deep Q-learning approach to obtain more reliable critical care policies.
We evaluate our method in off-policy and offline settings using simulated environments and real health records from intensive care units.
arXiv Detail & Related papers (2023-06-13T18:02:57Z) - Deep Offline Reinforcement Learning for Real-world Treatment
Optimization Applications [3.770564448216192]
We introduce a practical and theoretically grounded transition sampling approach to address action imbalance during offline RL training.
We perform extensive experiments on two real-world tasks for diabetes and sepsis treatment optimization.
Across a range of principled and clinically relevant metrics, we show that our proposed approach enables substantial improvements in expected health outcomes.
arXiv Detail & Related papers (2023-02-15T09:30:57Z) - Addressing the issue of stochastic environments and local
decision-making in multi-objective reinforcement learning [0.0]
Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL)
This thesis focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment.
arXiv Detail & Related papers (2022-11-16T04:56:42Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.