Who Does Your Algorithm Fail? Investigating Age and Ethnic Bias in the MAMA-MIA Dataset
- URL: http://arxiv.org/abs/2510.27421v1
- Date: Fri, 31 Oct 2025 12:20:31 GMT
- Title: Who Does Your Algorithm Fail? Investigating Age and Ethnic Bias in the MAMA-MIA Dataset
- Authors: Aditya Parikh, Sneha Das, Aasa Feragen,
- Abstract summary: We audit the fairness of the automated segmentation labels provided in the breast cancer tumor segmentation dataset MAMA-MIA.<n>Our analysis reveals an intrinsic age-related bias against younger patients that continues to persist even after controlling for confounding factors, such as data source.
- Score: 8.774604259603304
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models aim to improve diagnostic workflows, but fairness evaluation remains underexplored beyond classification, e.g., in image segmentation. Unaddressed segmentation bias can lead to disparities in the quality of care for certain populations, potentially compounded across clinical decision points and amplified through iterative model development. Here, we audit the fairness of the automated segmentation labels provided in the breast cancer tumor segmentation dataset MAMA-MIA. We evaluate automated segmentation quality across age, ethnicity, and data source. Our analysis reveals an intrinsic age-related bias against younger patients that continues to persist even after controlling for confounding factors, such as data source. We hypothesize that this bias may be linked to physiological factors, a known challenge for both radiologists and automated systems. Finally, we show how aggregating data from multiple data sources influences site-specific ethnic biases, underscoring the necessity of investigating data at a granular level.
Related papers
- On the Role of Calibration in Benchmarking Algorithmic Fairness for Skin Cancer Detection [0.03066137405373616]
We assess the performance of the leading skin cancer detection algorithm of the ISIC 2020 Challenge dataset.<n>We compare it with the second and third place models, focusing on subgroups defined by sex, race, and age.<n>Our findings reveal that while existing models enhance discriminative accuracy, they often over-diagnose risk and exhibit calibration issues when applied to new datasets.
arXiv Detail & Related papers (2025-11-10T23:51:04Z) - Investigating Label Bias and Representational Sources of Age-Related Disparities in Medical Segmentation [8.774604259603304]
Algorithmic bias in medical imaging can perpetuate health disparities.<n>In breast cancer segmentation, models exhibit significant performance disparities against younger patients.<n>This work introduces a systematic framework for diagnosing algorithmic bias in medical segmentation.
arXiv Detail & Related papers (2025-11-01T10:06:30Z) - Detecting Dataset Bias in Medical AI: A Generalized and Modality-Agnostic Auditing Framework [8.017827642932746]
Generalized Attribute Utility and Detectability-Induced bias Testing (G-AUDIT) for datasets is a modality-agnostic dataset auditing framework.<n>Our method examines the relationship between task-level annotations and data properties including patient attributes.<n>G-AUDIT successfully identifies subtle biases commonly overlooked by traditional qualitative methods.
arXiv Detail & Related papers (2025-03-13T02:16:48Z) - A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds [49.34500499203579]
We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics.
We generate high-quality synthetic fMRI data based on user-supplied demographics.
arXiv Detail & Related papers (2024-05-13T17:49:20Z) - Fairness-Aware Data Augmentation for Cardiac MRI using Text-Conditioned Diffusion Models [1.6581402323174208]
We propose a method to alleviate imbalances inherent in datasets through the generation of synthetic data.<n>We adopt ControlNet based on a denoising diffusion probabilistic model to condition on text assembled from patient metadata and cardiac geometry.<n>Our experiments demonstrate the effectiveness of the proposed approach in mitigating dataset imbalances.
arXiv Detail & Related papers (2024-03-28T15:41:43Z) - Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline
Algorithm: Application to the ICU Length of Stay Prediction [65.268245109828]
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the ICU length of stay.
The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction.
The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
arXiv Detail & Related papers (2023-12-31T16:01:48Z) - An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in
Healthcare Datasets [32.25265709333831]
We generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity)
We then apply a systematic analysis of AEq values across subpopulations to identify and manifestations of racial bias in two known cases in healthcare.
AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
arXiv Detail & Related papers (2023-11-06T17:08:41Z) - Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data.
The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.