An Investigation Into Race Bias in Random Forest Models Based on Breast
DCE-MRI Derived Radiomics Features
- URL: http://arxiv.org/abs/2309.17197v1
- Date: Fri, 29 Sep 2023 12:45:53 GMT
- Title: An Investigation Into Race Bias in Random Forest Models Based on Breast
DCE-MRI Derived Radiomics Features
- Authors: Mohamed Huti, Tiarna Lee, Elinor Sawyer, Andrew P. King
- Abstract summary: We investigate the potential for race bias in random forest (RF) models trained using radiomics features.
RF models trained to predict tumour molecular subtype using race-imbalanced data seem to produce biased behaviour.
- Score: 1.9077771032617559
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent research has shown that artificial intelligence (AI) models can
exhibit bias in performance when trained using data that are imbalanced by
protected attribute(s). Most work to date has focused on deep learning models,
but classical AI techniques that make use of hand-crafted features may also be
susceptible to such bias. In this paper we investigate the potential for race
bias in random forest (RF) models trained using radiomics features. Our
application is prediction of tumour molecular subtype from dynamic contrast
enhanced magnetic resonance imaging (DCE-MRI) of breast cancer patients. Our
results show that radiomics features derived from DCE-MRI data do contain
race-identifiable information, and that RF models can be trained to predict
White and Black race from these data with 60-70% accuracy, depending on the
subset of features used. Furthermore, RF models trained to predict tumour
molecular subtype using race-imbalanced data seem to produce biased behaviour,
exhibiting better performance on test data from the race on which they were
trained.
Related papers
- Development and Comparative Analysis of Machine Learning Models for Hypoxemia Severity Triage in CBRNE Emergency Scenarios Using Physiological and Demographic Data from Medical-Grade Devices [0.0]
Gradient Boosting Models (GBMs) outperformed sequential models in terms of training speed, interpretability, and reliability.
A 5-minute prediction window was chosen for timely intervention, with minute-levels standardizing the data.
This study highlights ML's potential to improve triage and reduce alarm fatigue.
arXiv Detail & Related papers (2024-10-30T23:24:28Z) - Generative causal testing to bridge data-driven models and scientific theories in language neuroscience [82.995061475971]
We present generative causal testing (GCT), a framework for generating concise explanations of language selectivity in the brain.
We show that GCT can dissect fine-grained differences between brain areas with similar functional selectivity.
arXiv Detail & Related papers (2024-10-01T15:57:48Z) - A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds [49.34500499203579]
We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics.
We generate high-quality synthetic fMRI data based on user-supplied demographics.
arXiv Detail & Related papers (2024-05-13T17:49:20Z) - MCRAGE: Synthetic Healthcare Data for Fairness [3.0089659534785853]
We propose Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE) to augment imbalanced datasets.
MCRAGE involves training a Denoising Diffusion Probabilistic Model (CDDPM) capable of generating high-quality synthetic EHR samples from underrepresented classes.
We use this synthetic data to augment the existing imbalanced dataset, resulting in a more balanced distribution across all classes.
arXiv Detail & Related papers (2023-10-27T19:02:22Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - An investigation into the impact of deep learning model choice on sex
and race bias in cardiac MR segmentation [8.449342469976758]
We investigate how imbalances in subject sex and race affect AI-based cine cardiac magnetic resonance image segmentation.
We find significant sex bias in three of the four models and racial bias in all of the models.
arXiv Detail & Related papers (2023-08-25T14:55:38Z) - Deep Learning for Predicting Progression of Patellofemoral
Osteoarthritis Based on Lateral Knee Radiographs, Demographic Data and
Symptomatic Assessments [1.1549572298362785]
This study included subjects (1832 subjects, 3276 knees) from the baseline of the MOST study.
PF joint regions-of-interest were identified using an automated landmark detection tool (BoneFinder) on lateral knee X-rays.
Risk factors included age, sex, BMI and WOMAC score, and the radiographic osteoarthritis stage of the tibiofemoral joint (KL score)
arXiv Detail & Related papers (2023-05-10T06:43:33Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Efficient Out-of-Distribution Detection of Melanoma with Wavelet-based
Normalizing Flows [22.335623464185105]
Melanoma is a serious form of skin cancer with high mortality rate at later stages.
datasets are heavily imbalanced which complicates training current state-of-the-art supervised classification AI models.
We propose to use generative models to learn the benign data distribution and detect Out-of-Distribution (OOD) malignant images through density estimation.
arXiv Detail & Related papers (2022-08-09T09:57:56Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.