Related papers: An Investigation Into Race Bias in Random Forest Models Based on Breast DCE-MRI Derived Radiomics Features

An Investigation Into Race Bias in Random Forest Models Based on Breast DCE-MRI Derived Radiomics Features

URL: http://arxiv.org/abs/2309.17197v1
Date: Fri, 29 Sep 2023 12:45:53 GMT
Title: An Investigation Into Race Bias in Random Forest Models Based on Breast DCE-MRI Derived Radiomics Features
Authors: Mohamed Huti, Tiarna Lee, Elinor Sawyer, Andrew P. King
Abstract summary: We investigate the potential for race bias in random forest (RF) models trained using radiomics features. RF models trained to predict tumour molecular subtype using race-imbalanced data seem to produce biased behaviour.
Score: 1.9077771032617559
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent research has shown that artificial intelligence (AI) models can exhibit bias in performance when trained using data that are imbalanced by protected attribute(s). Most work to date has focused on deep learning models, but classical AI techniques that make use of hand-crafted features may also be susceptible to such bias. In this paper we investigate the potential for race bias in random forest (RF) models trained using radiomics features. Our application is prediction of tumour molecular subtype from dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) of breast cancer patients. Our results show that radiomics features derived from DCE-MRI data do contain race-identifiable information, and that RF models can be trained to predict White and Black race from these data with 60-70% accuracy, depending on the subset of features used. Furthermore, RF models trained to predict tumour molecular subtype using race-imbalanced data seem to produce biased behaviour, exhibiting better performance on test data from the race on which they were trained.

Related papers

Minimum Data, Maximum Impact: 20 annotated samples for explainable lung nodule classification [0.0]
Radiologists use attributes like shape and texture as established diagnostic criteria and mirroring these in AI decision-making.<n>The adoption of such models is limited by the scarcity of large-scale medical image datasets annotated with these attributes.<n>This work highlights the potential of synthetic data to overcome dataset limitations, enhancing the applicability of explainable models in medical image analysis.
arXiv Detail & Related papers (2025-08-01T13:54:34Z)
Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography [0.0]
We evaluate radiomics-based and deep learning-based approaches for disease detection in chest radiography.<n>Deep learning models learn directly from image data, while radiomics-based models extract handcrafted features.<n>These findings provide statistically validated, data-driven recommendations for model selection in diagnostic AI.
arXiv Detail & Related papers (2025-04-16T16:54:37Z)
Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices [0.0]
Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability. A 5-minute prediction window was chosen for timely intervention, with minute-levels standardizing the data. This study highlights ML's potential to improve triage and reduce alarm fatigue.
arXiv Detail & Related papers (2024-10-30T23:24:28Z)
Generative causal testing to bridge data-driven models and scientific theories in language neuroscience [82.995061475971]
We present generative causal testing (GCT), a framework for generating concise explanations of language selectivity in the brain. We show that GCT can dissect fine-grained differences between brain areas with similar functional selectivity.
arXiv Detail & Related papers (2024-10-01T15:57:48Z)
A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds [49.34500499203579]
We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics. We generate high-quality synthetic fMRI data based on user-supplied demographics.
arXiv Detail & Related papers (2024-05-13T17:49:20Z)
MCRAGE: Synthetic Healthcare Data for Fairness [3.0089659534785853]
We propose Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE) to augment imbalanced datasets. MCRAGE involves training a Denoising Diffusion Probabilistic Model (CDDPM) capable of generating high-quality synthetic EHR samples from underrepresented classes. We use this synthetic data to augment the existing imbalanced dataset, resulting in a more balanced distribution across all classes.
arXiv Detail & Related papers (2023-10-27T19:02:22Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
An investigation into the impact of deep learning model choice on sex and race bias in cardiac MR segmentation [8.449342469976758]
We investigate how imbalances in subject sex and race affect AI-based cine cardiac magnetic resonance image segmentation. We find significant sex bias in three of the four models and racial bias in all of the models.
arXiv Detail & Related papers (2023-08-25T14:55:38Z)
Deep Learning for Predicting Progression of Patellofemoral Osteoarthritis Based on Lateral Knee Radiographs, Demographic Data and Symptomatic Assessments [1.1549572298362785]
This study included subjects (1832 subjects, 3276 knees) from the baseline of the MOST study. PF joint regions-of-interest were identified using an automated landmark detection tool (BoneFinder) on lateral knee X-rays. Risk factors included age, sex, BMI and WOMAC score, and the radiographic osteoarthritis stage of the tibiofemoral joint (KL score)
arXiv Detail & Related papers (2023-05-10T06:43:33Z)
Clinical Deterioration Prediction in Brazilian Hospitals Based on Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD) The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z)
Efficient Out-of-Distribution Detection of Melanoma with Wavelet-based Normalizing Flows [22.335623464185105]
Melanoma is a serious form of skin cancer with high mortality rate at later stages. datasets are heavily imbalanced which complicates training current state-of-the-art supervised classification AI models. We propose to use generative models to learn the benign data distribution and detect Out-of-Distribution (OOD) malignant images through density estimation.
arXiv Detail & Related papers (2022-08-09T09:57:56Z)
Mixed Effects Neural ODE: A Variational Approximation for Analyzing the Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data. We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem. We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Hemogram Data as a Tool for Decision-making in COVID-19 Management: Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure. This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients. Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.