Hybrid Adaptive Conformal Offline Reinforcement Learning for Fair Population Health Management
- URL: http://arxiv.org/abs/2509.09772v1
- Date: Thu, 11 Sep 2025 18:09:28 GMT
- Title: Hybrid Adaptive Conformal Offline Reinforcement Learning for Fair Population Health Management
- Authors: Sanjay Basu, Sadiq Y. Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra, Rajaie Batniji,
- Abstract summary: Population health management programs for Medicaid populations coordinate longitudinal outreach and services.<n>We present a Hybrid Adaptive Conformal Offline Reinforcement Learning framework that separates risk calibration from preference optimization to generate conservative action recommendations.
- Score: 1.5635627702544692
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Population health management programs for Medicaid populations coordinate longitudinal outreach and services (e.g., benefits navigation, behavioral health, social needs support, and clinical scheduling) and must be safe, fair, and auditable. We present a Hybrid Adaptive Conformal Offline Reinforcement Learning (HACO) framework that separates risk calibration from preference optimization to generate conservative action recommendations at scale. In our setting, each step involves choosing among common coordination actions (e.g., which member to contact, by which modality, and whether to route to a specialized service) while controlling the near-term risk of adverse utilization events (e.g., unplanned emergency department visits or hospitalizations). Using a de-identified operational dataset from Waymark comprising 2.77 million sequential decisions across 168,126 patients, HACO (i) trains a lightweight risk model for adverse events, (ii) derives a conformal threshold to mask unsafe actions at a target risk level, and (iii) learns a preference policy on the resulting safe subset. We evaluate policies with a version-agnostic fitted Q evaluation (FQE) on stratified subsets and audit subgroup performance across age, sex, and race. HACO achieves strong risk discrimination (AUC ~0.81) with a calibrated threshold ( {\tau} ~0.038 at {\alpha} = 0.10), while maintaining high safe coverage. Subgroup analyses reveal systematic differences in estimated value across demographics, underscoring the importance of fairness auditing. Our results show that conformal risk gating integrates cleanly with offline RL to deliver conservative, auditable decision support for population health management teams.
Related papers
- An interpretable data-driven approach to optimizing clinical fall risk assessment [0.0559762074594338]
We aim to better align fall risk prediction from the Johns Hopkins Fall Risk Assessment Tool with clinically meaningful measures via a data-driven modelling approach.<n>We employed constrained score optimization models to reweight the JHFRAT scoring weights, while preserving its additive structure and clinical thresholds.<n>The model demonstrated significant improvements in predictive performance over the current JHFRAT.
arXiv Detail & Related papers (2026-01-08T18:17:31Z) - Adaptive Conformal Prediction via Bayesian Uncertainty Weighting for Hierarchical Healthcare Data [2.922743999325622]
We present a hybrid Bayesian-conformal framework that addresses the fundamental limitation in healthcare predictions.<n>Our approach integrates Bayesian hierarchical random forests with group-aware conformal calibration, using posterior uncertainties to weight conformity scores.<n>We evaluate our method on 61,538 admissions across 3,793 U.S. hospitals and 4 regions.
arXiv Detail & Related papers (2026-01-03T16:06:37Z) - Feasibility-Guided Fair Adaptive Offline Reinforcement Learning for Medicaid Care Management [1.5635627702544692]
We introduce Feasibility-Guided Fair Adaptive Reinforcement Learning (FG-FARL)<n>FG-FARL calibrates per-group safety thresholds to reduce harm while equalizing a chosen fairness target (coverage or harm) across protected subgroups.
arXiv Detail & Related papers (2025-09-11T17:50:06Z) - Conditional Conformal Risk Adaptation [9.559062601251464]
We develop a new score function for creating adaptive prediction sets that significantly improve conditional risk control for segmentation tasks.<n>We introduce a specialized probability calibration framework that enhances the reliability of pixel-wise inclusion estimates.<n>Our experiments on polyp segmentation demonstrate that all three methods provide valid marginal risk control and deliver more consistent conditional risk control.
arXiv Detail & Related papers (2025-04-10T10:01:06Z) - Towards Regulatory-Confirmed Adaptive Clinical Trials: Machine Learning Opportunities and Solutions [59.28853595868749]
We introduce two new objectives for future clinical trials that integrate regulatory constraints and treatment policy value for both the entire population and under-served populations.<n>We formulate Randomize First Augment Next (RFAN), a new framework for designing Phase III clinical trials.<n>Our framework consists of a standard randomized component followed by an adaptive one, jointly meant to efficiently and safely acquire and assign patients into treatment arms during the trial.
arXiv Detail & Related papers (2025-03-12T10:17:54Z) - Distribution-Free Uncertainty Quantification in Mechanical Ventilation Treatment: A Conformal Deep Q-Learning Framework [2.5070297884580874]
This study introduces ConformalDQN, a distribution-free conformal deep Q-learning approach for optimizing mechanical ventilation in intensive care units.<n>We trained and evaluated our model using ICU patient records from the MIMIC-IV database.
arXiv Detail & Related papers (2024-12-17T06:55:20Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Deep Reinforcement Learning for Efficient and Fair Allocation of Health Care Resources [47.57108369791273]
Scarcity of health care resources could result in the unavoidable consequence of rationing.
There is no universally accepted standard for health care resource allocation protocols.
We propose a transformer-based deep Q-network to integrate the disease progression of individual patients and the interaction effects among patients.
arXiv Detail & Related papers (2023-09-15T17:28:06Z) - Optimal and Fair Encouragement Policy Evaluation and Learning [11.712023983596914]
We study causal identification and robust estimation of optimal treatment rules, including under potential violations of positivity.
We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds.
We illustrate the methods in three case studies based on data from reminders of SNAP benefits, randomized encouragement to enroll in insurance, and from pretrial supervised release with electronic monitoring.
arXiv Detail & Related papers (2023-09-12T20:45:30Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Is Risk-Sensitive Reinforcement Learning Properly Resolved? [54.00107408956307]
We propose a novel algorithm, namely Trajectory Q-Learning (TQL), for RSRL problems with provable policy improvement.<n>Based on our new learning architecture, we are free to introduce a general and practical implementation for different risk measures to learn disparate risk-sensitive policies.
arXiv Detail & Related papers (2023-07-02T11:47:21Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.