Related papers: One Size Fits None: Rethinking Fairness in Medical AI

One Size Fits None: Rethinking Fairness in Medical AI

URL: http://arxiv.org/abs/2506.14400v1
Date: Tue, 17 Jun 2025 10:59:02 GMT
Title: One Size Fits None: Rethinking Fairness in Medical AI
Authors: Roland Roller, Michael Hahn, Ajay Madhavan Ravichandran, Bilgin Osmanodja, Florian Oetke, Zeineb Sassi, Aljoscha Burchardt, Klaus Netter, Klemens Budde, Anne Herrmann, Tobias Strapatsas, Peter Dabrock, Sebastian Möller,
Abstract summary: Real-world medical datasets are often noisy, incomplete, and imbalanced.<n>Differences raise fairness concerns, particularly when they reinforce existing disadvantages for marginalized groups.
Score: 7.163867603298375
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning (ML) models are increasingly used to support clinical decision-making. However, real-world medical datasets are often noisy, incomplete, and imbalanced, leading to performance disparities across patient subgroups. These differences raise fairness concerns, particularly when they reinforce existing disadvantages for marginalized groups. In this work, we analyze several medical prediction tasks and demonstrate how model performance varies with patient characteristics. While ML models may demonstrate good overall performance, we argue that subgroup-level evaluation is essential before integrating them into clinical workflows. By conducting a performance analysis at the subgroup level, differences can be clearly identified-allowing, on the one hand, for performance disparities to be considered in clinical practice, and on the other hand, for these insights to inform the responsible development of more effective models. Thereby, our work contributes to a practical discussion around the subgroup-sensitive development and deployment of medical ML models and the interconnectedness of fairness and transparency.

Related papers

Inspecting Model Fairness in Ultrasound Segmentation Tasks [20.281029492841878]
We inspect a series of deep learning (DL) segmentation models using two ultrasound datasets. Our findings reveal that even state-of-the-art DL algorithms demonstrate unfair behavior in ultrasound segmentation tasks. These results serve as a crucial warning, underscoring the necessity for careful model evaluation before their deployment in real-world scenarios.
arXiv Detail & Related papers (2023-12-05T05:08:08Z)
TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment. In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials. We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z)
The Role of Subgroup Separability in Group-Fair Medical Image Classification [18.29079361470428]
We find a relationship between subgroup separability, subgroup disparities, and performance degradation when models are trained on data with systematic bias such as underdiagnosis. Our findings shed new light on the question of how models become biased, providing important insights for the development of fair medical imaging AI.
arXiv Detail & Related papers (2023-07-06T06:06:47Z)
Auditing ICU Readmission Rates in an Clinical Database: An Analysis of Risk Factors and Clinical Outcomes [0.0]
This study presents a machine learning pipeline for clinical data classification in the context of a 30-day readmission problem. The fairness audit uncovers disparities in equal opportunity, predictive parity, false positive rate parity, and false negative rate parity criteria. The study suggests the need for collaborative efforts among researchers, policymakers, and practitioners to address bias and fairness in artificial intelligence (AI) systems.
arXiv Detail & Related papers (2023-04-12T17:09:38Z)
Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights. Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion. Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z)
Evaluating the Fairness of Deep Learning Uncertainty Estimates in Medical Image Analysis [3.5536769591744557]
Deep learning (DL) models have shown great success in many medical image analysis tasks. However, deployment of the resulting models into real clinical contexts requires robustness and fairness across different sub-populations. Recent studies have shown significant biases in DL models across demographic subgroups, indicating a lack of fairness in the models.
arXiv Detail & Related papers (2023-03-06T16:01:30Z)
Detecting Shortcut Learning for Fair Medical AI using Shortcut Testing [62.9062883851246]
Machine learning holds great promise for improving healthcare, but it is critical to ensure that its use will not propagate or amplify health disparities. One potential driver of algorithmic unfairness, shortcut learning, arises when ML models base predictions on improper correlations in the training data. Using multi-task learning, we propose the first method to assess and mitigate shortcut learning as a part of the fairness assessment of clinical ML systems.
arXiv Detail & Related papers (2022-07-21T09:35:38Z)
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem. Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools. We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z)
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models [69.09570726777817]
We introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input. We show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns.
arXiv Detail & Related papers (2021-11-30T15:52:04Z)
Explaining medical AI performance disparities across sites with confounder Shapley value analysis [8.785345834486057]
Multi-site evaluations are key to diagnosing such disparities. Our framework provides a method for quantifying the marginal and cumulative effect of each type of bias on the overall performance difference. We demonstrate its usefulness in a case study of a deep learning model trained to detect the presence of pneumothorax.
arXiv Detail & Related papers (2021-11-12T18:54:10Z)
Measuring Fairness Under Unawareness of Sensitive Attributes: A Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes. We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z)
Adversarial Sample Enhanced Domain Adaptation: A Case Study on Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation. adversarially generated samples are used during domain adaptation. Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.