Related papers: Comparative assessment of fairness definitions and bias mitigation strategies in machine learning-based diagnosis of Alzheimer's disease from MR images

Comparative assessment of fairness definitions and bias mitigation strategies in machine learning-based diagnosis of Alzheimer's disease from MR images

URL: http://arxiv.org/abs/2505.23528v1
Date: Thu, 29 May 2025 15:07:19 GMT
Title: Comparative assessment of fairness definitions and bias mitigation strategies in machine learning-based diagnosis of Alzheimer's disease from MR images
Authors: Maria Eleftheria Vlontzou, Maria Athanasiou, Christos Davatzikos, Konstantina S. Nikita,
Abstract summary: The present study performs a fairness analysis of machine learning (ML) models for the diagnosis of Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD) from MRI-derived neuroimaging features.<n> Biases associated with age, race, and gender in a multi-cohort dataset are investigated.<n>Results reveal the existence of biases related to age and race, while no significant gender bias is observed.
Score: 4.569587135821805
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The present study performs a comprehensive fairness analysis of machine learning (ML) models for the diagnosis of Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD) from MRI-derived neuroimaging features. Biases associated with age, race, and gender in a multi-cohort dataset, as well as the influence of proxy features encoding these sensitive attributes, are investigated. The reliability of various fairness definitions and metrics in the identification of such biases is also assessed. Based on the most appropriate fairness measures, a comparative analysis of widely used pre-processing, in-processing, and post-processing bias mitigation strategies is performed. Moreover, a novel composite measure is introduced to quantify the trade-off between fairness and performance by considering the F1-score and the equalized odds ratio, making it appropriate for medical diagnostic applications. The obtained results reveal the existence of biases related to age and race, while no significant gender bias is observed. The deployed mitigation strategies yield varying improvements in terms of fairness across the different sensitive attributes and studied subproblems. For race and gender, Reject Option Classification improves equalized odds by 46% and 57%, respectively, and achieves harmonic mean scores of 0.75 and 0.80 in the MCI versus AD subproblem, whereas for age, in the same subproblem, adversarial debiasing yields the highest equalized odds improvement of 40% with a harmonic mean score of 0.69. Insights are provided into how variations in AD neuropathology and risk factors, associated with demographic characteristics, influence model fairness.

Related papers

Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert [18.169924728540487]
We introduce FairMoE, a framework that employs layer-wise mixture-of-experts modules to serve as group-specific learners.<n>Unlike traditional methods that rigidly assign data based on group labels, FairMoE dynamically routes data to the most suitable expert.
arXiv Detail & Related papers (2025-06-21T18:42:00Z)
On the Bias, Fairness, and Bias Mitigation for a Wearable-based Freezing of Gait Detection in Parkinson's Disease [0.20971479389679332]
Freezing of gait (FOG) is a debilitating feature of Parkinson's disease (PD)<n>Recent advances in wearable-based human activity recognition (HAR) technology have enabled the detection of FOG subtypes across benchmark datasets.<n>We evaluated the bias and fairness of HAR models for wearable-based FOG detection across demographics and PD conditions.
arXiv Detail & Related papers (2025-01-29T18:43:01Z)
Comprehensive Methodology for Sample Augmentation in EEG Biomarker Studies for Alzheimers Risk Classification [0.0]
Alzheimer's disease (AD), the leading type, accounts for 70% of cases.<n>EEG measures show promise in identifying AD risk, but obtaining large samples for reliable comparisons is challenging.<n>This study integrates signal processing, harmonization, and statistical techniques to enhance sample size and improve AD risk classification reliability.
arXiv Detail & Related papers (2024-11-20T10:31:02Z)
AXIAL: Attention-based eXplainability for Interpretable Alzheimer's Localized Diagnosis using 2D CNNs on 3D MRI brain scans [43.06293430764841]
This study presents an innovative method for Alzheimer's disease diagnosis using 3D MRI designed to enhance the explainability of model decisions. Our approach adopts a soft attention mechanism, enabling 2D CNNs to extract volumetric representations. With voxel-level precision, our method identified which specific areas are being paid attention to, identifying these predominant brain regions.
arXiv Detail & Related papers (2024-07-02T16:44:00Z)
Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z)
Fairness Evolution in Continual Learning for Medical Imaging [47.52603262576663]
We study the behavior of Continual Learning (CL) strategies in medical imaging regarding classification performance. We evaluate the Replay, Learning without Forgetting (LwF), LwF, and Pseudo-Label strategies. LwF and Pseudo-Label exhibit optimal classification performance, but when including fairness metrics in the evaluation, it is clear that Pseudo-Label is less biased.
arXiv Detail & Related papers (2024-04-10T09:48:52Z)
FERI: A Multitask-based Fairness Achieving Algorithm with Applications to Fair Organ Transplantation [15.481475313958219]
We introduce Fairness through the Equitable Rate of Improvement in Multitask Learning (FERI) algorithm for fair predictions of graft failure risk in liver transplant patients. FERI constrains subgroup loss by balancing learning rates and preventing subgroup dominance in the training process.
arXiv Detail & Related papers (2023-10-20T21:14:07Z)
Auditing ICU Readmission Rates in an Clinical Database: An Analysis of Risk Factors and Clinical Outcomes [0.0]
This study presents a machine learning pipeline for clinical data classification in the context of a 30-day readmission problem. The fairness audit uncovers disparities in equal opportunity, predictive parity, false positive rate parity, and false negative rate parity criteria. The study suggests the need for collaborative efforts among researchers, policymakers, and practitioners to address bias and fairness in artificial intelligence (AI) systems.
arXiv Detail & Related papers (2023-04-12T17:09:38Z)
Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance. The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z)
Explaining medical AI performance disparities across sites with confounder Shapley value analysis [8.785345834486057]
Multi-site evaluations are key to diagnosing such disparities. Our framework provides a method for quantifying the marginal and cumulative effect of each type of bias on the overall performance difference. We demonstrate its usefulness in a case study of a deep learning model trained to detect the presence of pneumothorax.
arXiv Detail & Related papers (2021-11-12T18:54:10Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system. Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model. We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification. It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.