Related papers: FairGRPO: Fair Reinforcement Learning for Equitable Clinical Reasoning

FairGRPO: Fair Reinforcement Learning for Equitable Clinical Reasoning

URL: http://arxiv.org/abs/2510.19893v1
Date: Wed, 22 Oct 2025 17:26:16 GMT
Title: FairGRPO: Fair Reinforcement Learning for Equitable Clinical Reasoning
Authors: Shiqi Dai, Wei Dai, Jiaee Cheong, Paul Pu Liang,
Abstract summary: We introduce Fairness-aware Group Relative Policy Optimization (FairGRPO), a hierarchical reinforcement learning approach that promotes equitable learning across heterogeneous clinical populations.<n>We demonstrate that FairGRPO reduces predictive parity by 27.2% against all vanilla and bias mitigated RL baselines, while improving F1 score by 12.49%.<n>Based on FairGRPO, we release FairMedGemma-4B, a fairness-aware clinical VLLM that achieves state-of-the-art performance while demonstrating significantly reduced disparities across demographic groups.
Score: 29.271963682064044
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Medical artificial intelligence systems have achieved remarkable diagnostic capabilities, yet they consistently exhibit performance disparities across demographic groups, causing real-world harm to underrepresented populations. While recent multimodal reasoning foundation models have advanced clinical diagnosis through integrated analysis of diverse medical data, reasoning trainings via reinforcement learning inherit and often amplify biases present in training datasets dominated by majority populations. We introduce Fairness-aware Group Relative Policy Optimization (FairGRPO), a hierarchical reinforcement learning approach that promotes equitable learning across heterogeneous clinical populations. FairGRPO employs adaptive importance weighting of advantages based on representation, task difficulty, and data source. To address the common issue of missing demographic labels in the clinical domain, we further employ unsupervised clustering, which automatically discovers latent demographic groups when labels are unavailable. Through comprehensive experiments across 7 clinical diagnostic datasets spanning 5 clinical modalities across X-ray, CT scan, dermoscropy, mammography and ultrasound, we demonstrate that FairGRPO reduces predictive parity by 27.2% against all vanilla and bias mitigated RL baselines, while improving F1 score by 12.49%. Furthermore, training dynamics analysis reveals that FairGRPO progressively improves fairness throughout optimization, while baseline RL methods exhibit deteriorating fairness as training progresses. Based on FairGRPO, we release FairMedGemma-4B, a fairness-aware clinical VLLM that achieves state-of-the-art performance while demonstrating significantly reduced disparities across demographic groups.

Related papers

Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z)
Developing Fairness-Aware Task Decomposition to Improve Equity in Post-Spinal Fusion Complication Prediction [3.860970992977915]
We propose a fairness-aware multitask learning framework for postoperative complication prediction.<n> FAIR-MTL employs a data-driven subgroup inference mechanism.<n>It achieves an AUC of 0.86 and an accuracy of 75%, outperforming single-task baselines while substantially reducing bias.
arXiv Detail & Related papers (2025-11-29T19:06:07Z)
FAST-CAD: A Fairness-Aware Framework for Non-Contact Stroke Diagnosis [8.939811267715228]
We propose FAST-CAD, a theoretically grounded framework that combines domain-adversarial training with group distributionally robust optimization.<n>Our approach is built on domain adaptation and minimax fairness theory and provides convergence bounds guarantees and fairness.<n>Experiments show that our method achieves superior diagnostic performance while maintaining fairness across demographic groups.
arXiv Detail & Related papers (2025-11-12T01:40:58Z)
Achieving Fairness Without Harm via Selective Demographic Experts [16.212815178841087]
bias mitigation techniques often impose a trade-off between fairness and accuracy.<n>In high-stakes domains like clinical diagnosis, such trade-offs are ethically and practically unacceptable.<n>We propose a fairness-without-harm approach by learning distinct representations for different demographic groups.
arXiv Detail & Related papers (2025-11-09T09:11:02Z)
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography [4.117899774444893]
We explore the fairness and bias of foundation models (FMs) for breast mammography classification.<n>We leverage a large pool of datasets from diverse sources-including data from underrepresented regions and an in-house dataset.<n>Our experiments show that while modality-specific pre-training of FMs enhances performance, classifiers trained on features from individual datasets fail to generalize across domains.
arXiv Detail & Related papers (2025-05-14T06:56:17Z)
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models [115.16940773660104]
This paper introduces FVL-FP, a novel framework that combines FL with fair prompt tuning techniques.<n>We focus on mitigating demographic biases while preserving model performance.<n>Our approach reduces demographic disparity by an average of 45% compared to standard FL approaches.
arXiv Detail & Related papers (2025-05-03T16:09:52Z)
FairREAD: Re-fusing Demographic Attributes after Disentanglement for Fair Medical Image Classification [3.615240611746158]
We propose Fair Re-fusion After Disentanglement (FairREAD), a framework that mitigates unfairness by re-integrating sensitive demographic attributes into fair image representations.<n>FairREAD employs adversarial training to disentangle demographic information while using a controlled re-fusion mechanism to preserve clinically relevant details.<n> Comprehensive evaluations on a large-scale clinical X-ray dataset demonstrate that FairREAD significantly reduces unfairness metrics while maintaining diagnostic accuracy.
arXiv Detail & Related papers (2024-12-20T22:17:57Z)
Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias [3.455189439319919]
We introduce Cross-Care, the first benchmark framework dedicated to assessing biases and real world knowledge in large language models (LLMs) We evaluate how demographic biases embedded in pre-training corpora like $ThePile$ influence the outputs of LLMs. Our results highlight substantial misalignment between LLM representation of disease prevalence and real disease prevalence rates across demographic subgroups.
arXiv Detail & Related papers (2024-05-09T02:33:14Z)
Fairness Evolution in Continual Learning for Medical Imaging [47.52603262576663]
This study examines how bias evolves across tasks using domain-specific fairness metrics and how different CL strategies impact this evolution.<n>Our results show that Learning without Forgetting and Pseudo-Label achieve optimal classification performance, but Pseudo-Label is less biased.
arXiv Detail & Related papers (2024-04-10T09:48:52Z)
How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance [64.1656365676171]
Group imbalance has been a known problem in empirical risk minimization. This paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance.
arXiv Detail & Related papers (2024-03-12T04:38:05Z)
Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline Algorithm: Application to the ICU Length of Stay Prediction [65.268245109828]
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the ICU length of stay. The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction. The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
arXiv Detail & Related papers (2023-12-31T16:01:48Z)
Generative models improve fairness of medical classifiers under distribution shifts [49.10233060774818]
We show that learning realistic augmentations automatically from data is possible in a label-efficient manner using generative models. We demonstrate that these learned augmentations can surpass ones by making models more robust and statistically fair in- and out-of-distribution.
arXiv Detail & Related papers (2023-04-18T18:15:38Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.