Feasibility-Guided Fair Adaptive Offline Reinforcement Learning for Medicaid Care Management
- URL: http://arxiv.org/abs/2509.09655v1
- Date: Thu, 11 Sep 2025 17:50:06 GMT
- Title: Feasibility-Guided Fair Adaptive Offline Reinforcement Learning for Medicaid Care Management
- Authors: Sanjay Basu, Sadiq Y. Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra, Rajaie Batniji,
- Abstract summary: We introduce Feasibility-Guided Fair Adaptive Reinforcement Learning (FG-FARL)<n>FG-FARL calibrates per-group safety thresholds to reduce harm while equalizing a chosen fairness target (coverage or harm) across protected subgroups.
- Score: 1.5635627702544692
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Feasibility-Guided Fair Adaptive Reinforcement Learning (FG-FARL), an offline RL procedure that calibrates per-group safety thresholds to reduce harm while equalizing a chosen fairness target (coverage or harm) across protected subgroups. Using de-identified longitudinal trajectories from a Medicaid population health management program, we evaluate FG-FARL against behavior cloning (BC) and HACO (Hybrid Adaptive Conformal Offline RL; a global conformal safety baseline). We report off-policy value estimates with bootstrap 95% confidence intervals and subgroup disparity analyses with p-values. FG-FARL achieves comparable value to baselines while improving fairness metrics, demonstrating a practical path to safer and more equitable decision support.
Related papers
- Mitigating Safety Tax via Distribution-Grounded Refinement in Large Reasoning Models [63.368505631152594]
Safety alignment incurs safety tax that perturbs a large reasoning model's (LRM) general reasoning ability.<n>Existing datasets used for safety alignment for an LRM are usually constructed by distilling safety reasoning traces and answers from an external LRM or human labeler.<n>We propose a safety alignment dataset construction method, dubbed DGR. DGR transforms and refines an existing out-of-distributional safety reasoning dataset to be aligned with the target's LLM inner distribution.
arXiv Detail & Related papers (2026-02-02T14:18:48Z) - Safe, Efficient, and Robust Reinforcement Learning for Ranking and Diffusion Models [2.231476498067998]
dissertation investigates how reinforcement learning methods can be designed to be safe, sample-efficient, and robust.<n> Framed through the unifying perspective of contextual-bandit RL, the work addresses two major application domains - ranking and recommendation, and text-to-image diffusion models.
arXiv Detail & Related papers (2025-10-17T08:37:38Z) - Hybrid Adaptive Conformal Offline Reinforcement Learning for Fair Population Health Management [1.5635627702544692]
Population health management programs for Medicaid populations coordinate longitudinal outreach and services.<n>We present a Hybrid Adaptive Conformal Offline Reinforcement Learning framework that separates risk calibration from preference optimization to generate conservative action recommendations.
arXiv Detail & Related papers (2025-09-11T18:09:28Z) - Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models [94.39278422567955]
Fine-tuning large language models (LLMs) on human preferences has proven successful in enhancing their capabilities.<n>However, ensuring the safety of LLMs during fine-tuning remains a critical concern.<n>We propose a supervised learning framework called Bi-Factorial Preference Optimization (BFPO) to address this issue.
arXiv Detail & Related papers (2024-08-27T17:31:21Z) - Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF [80.32171988565999]
We introduce a unified approach to online and offline RLHF -- value-incentivized preference optimization (VPO)<n>VPO regularizes the maximum-likelihood estimate of the reward function with the corresponding value function.<n>Experiments on text summarization and dialog verify the practicality and effectiveness of VPO.
arXiv Detail & Related papers (2024-05-29T17:51:42Z) - Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence [15.720824593964027]
This paper introduces a new policy gradient method for risk-sensitive DRL with general coherent risk measures.<n>For practical use, we design a categorical distributional policy gradient algorithm (GCDP) that approximates any distribution by a categorical family supported on some fixed points.
arXiv Detail & Related papers (2024-05-23T16:16:58Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Offline Reinforcement Learning with Adaptive Behavior Regularization [1.491109220586182]
offline reinforcement learning (RL) defines a sample-efficient learning paradigm, where a policy is learned from static and previously collected datasets.
We propose a novel approach, which we refer to as adaptive behavior regularization (ABR)
ABR enables the policy to adaptively adjust its optimization objective between cloning and improving over the policy used to generate the dataset.
arXiv Detail & Related papers (2022-11-15T15:59:11Z) - BRAC+: Improved Behavior Regularized Actor Critic for Offline
Reinforcement Learning [14.432131909590824]
Offline Reinforcement Learning aims to train effective policies using previously collected datasets.
Standard off-policy RL algorithms are prone to overestimations of the values of out-of-distribution (less explored) actions.
We improve the behavior regularized offline reinforcement learning and propose BRAC+.
arXiv Detail & Related papers (2021-10-02T23:55:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.