Related papers: Test-Time Learning and Inference-Time Deliberation for Efficiency-First Offline Reinforcement Learning in Care Coordination and Population Health Management

Test-Time Learning and Inference-Time Deliberation for Efficiency-First Offline Reinforcement Learning in Care Coordination and Population Health Management

URL: http://arxiv.org/abs/2509.16291v1
Date: Fri, 19 Sep 2025 14:41:47 GMT
Title: Test-Time Learning and Inference-Time Deliberation for Efficiency-First Offline Reinforcement Learning in Care Coordination and Population Health Management
Authors: Sanjay Basu, Sadiq Y. Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra, Rajaie Batniji,
Abstract summary: Care coordination and population health management programs serve large Medicaid and safety-net populations.<n>We propose a lightweight offline reinforcement learning (RL) approach that augments trained policies with (i) test-time learning via local neighborhood calibration, and (ii) inference-time deliberation via a small Q-ensemble.
Score: 1.5635627702544692
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Care coordination and population health management programs serve large Medicaid and safety-net populations and must be auditable, efficient, and adaptable. While clinical risk for outreach modalities is typically low, time and opportunity costs differ substantially across text, phone, video, and in-person visits. We propose a lightweight offline reinforcement learning (RL) approach that augments trained policies with (i) test-time learning via local neighborhood calibration, and (ii) inference-time deliberation via a small Q-ensemble that incorporates predictive uncertainty and time/effort cost. The method exposes transparent dials for neighborhood size and uncertainty/cost penalties and preserves an auditable training pipeline. Evaluated on a de-identified operational dataset, TTL+ITD achieves stable value estimates with predictable efficiency trade-offs and subgroup auditing.

Related papers

The hidden risks of temporal resampling in clinical reinforcement learning [0.0]
We show that temporal resampling significantly degrades the performance of offline reinforcement learning algorithms during live deployment.<n>We propose three mechanisms that drive this failure: the generation of counterfactual trajectories, the distortion of temporal expectations, and the compounding of generalisation errors.
arXiv Detail & Related papers (2026-02-06T11:02:06Z)
Hybrid Adaptive Conformal Offline Reinforcement Learning for Fair Population Health Management [1.5635627702544692]
Population health management programs for Medicaid populations coordinate longitudinal outreach and services.<n>We present a Hybrid Adaptive Conformal Offline Reinforcement Learning framework that separates risk calibration from preference optimization to generate conservative action recommendations.
arXiv Detail & Related papers (2025-09-11T18:09:28Z)
NurseSchedRL: Attention-Guided Reinforcement Learning for Nurse-Patient Assignment [0.0]
NurseSchedRL is a reinforcement learning framework for nurse-patient assignment.<n>It integrates structured state encoding, constrained action masking, and attention-based representations of skills, fatigue, and geographical context.
arXiv Detail & Related papers (2025-09-10T21:41:42Z)
From Coordination to Personalization: A Trust-Aware Simulation Framework for Emergency Department Decision Support [0.0]
This study proposes a simulation-based framework for modeling doctors and nurses as intelligent agents guided by computational trust mechanisms.<n>The proposed framework demonstrates the potential of computational trust for evidence-based decision support in emergency medicine.
arXiv Detail & Related papers (2025-09-09T18:00:44Z)
Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Healthcare Professionals [1.6574413179773761]
This paper explores the evolving relationship between clinician trust in LLMs and the impact of data sources from predominantly human-generated to AI-generated content. One of the primary concerns identified is the potential feedback loop that arises as LLMs become more reliant on their outputs for learning. A key takeaway from our investigation is the critical role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs.
arXiv Detail & Related papers (2024-03-15T04:04:45Z)
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care [46.2482873419289]
We introduce a deep Q-learning approach to obtain more reliable critical care policies. We evaluate our method in off-policy and offline settings using simulated environments and real health records from intensive care units.
arXiv Detail & Related papers (2023-06-13T18:02:57Z)
U-PASS: an Uncertainty-guided deep learning Pipeline for Automated Sleep Staging [61.6346401960268]
We propose a machine learning pipeline called U-PASS tailored for clinical applications that incorporates uncertainty estimation at every stage of the process. We apply our uncertainty-guided deep learning pipeline to the challenging problem of sleep staging and demonstrate that it systematically improves performance at every stage.
arXiv Detail & Related papers (2023-06-07T08:27:36Z)
Policy Optimization for Personalized Interventions in Behavioral Health [8.10897203067601]
Behavioral health interventions, delivered through digital platforms, have the potential to significantly improve health outcomes. We study the problem of optimizing personalized interventions for patients to maximize a long-term outcome. We present a new approach for this problem that we dub DecompPI, which decomposes the state space for a system of patients to the individual level.
arXiv Detail & Related papers (2023-03-21T21:42:03Z)
Adapting to Continuous Covariate Shift via Online Density Ratio Estimation [64.8027122329609]
Dealing with distribution shifts is one of the central challenges for modern machine learning. We propose an online method that can appropriately reuse historical information. Our density ratio estimation method is proven to perform well by enjoying a dynamic regret bound.
arXiv Detail & Related papers (2023-02-06T04:03:33Z)
Automated Fidelity Assessment for Strategy Training in Inpatient Rehabilitation using Natural Language Processing [53.096237570992294]
Strategy training is a rehabilitation approach that teaches skills to reduce disability among those with cognitive impairments following a stroke. Standardized fidelity assessment is used to measure adherence to treatment principles. We developed a rule-based NLP algorithm, a long-short term memory (LSTM) model, and a bidirectional encoder representation from transformers (BERT) model for this task.
arXiv Detail & Related papers (2022-09-14T15:33:30Z)
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z)
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)
Contextual Constrained Learning for Dose-Finding Clinical Trials [102.8283665750281]
C3T-Budget is a contextual constrained clinical trial algorithm for dose-finding under both budget and safety constraints. It recruits patients with consideration of the remaining budget, the remaining time, and the characteristics of each group.
arXiv Detail & Related papers (2020-01-08T11:46:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.