Related papers: An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets

An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets

URL: http://arxiv.org/abs/2311.03425v1
Date: Mon, 6 Nov 2023 17:08:41 GMT
Title: An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets
Authors: Faris F. Gulamali, Ashwin S. Sawant, Lora Liharska, Carol R. Horowitz, Lili Chan, Patricia H. Kovatch, Ira Hofer, Karandeep Singh, Lynne D. Richardson, Emmanuel Mensah, Alexander W Charney, David L. Reich, Jianying Hu, Girish N. Nadkarni
Abstract summary: We generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity) We then apply a systematic analysis of AEq values across subpopulations to identify and manifestations of racial bias in two known cases in healthcare. AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
Score: 32.25265709333831
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The adoption of diagnosis and prognostic algorithms in healthcare has led to concerns about the perpetuation of bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success. Here, we generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity). We then apply a systematic analysis of AEq values across subpopulations to identify and mitigate manifestations of racial bias in two known cases in healthcare - Chest X-rays diagnosis with deep convolutional neural networks and healthcare utilization prediction with multivariate logistic regression. AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.

Related papers

AI Alignment in Medical Imaging: Unveiling Hidden Biases Through Counterfactual Analysis [16.21270312974956]
We introduce a novel statistical framework to evaluate the dependency of medical imaging ML models on sensitive attributes, such as demographics. We present a practical algorithm that combines conditional latent diffusion models with statistical hypothesis testing to identify and quantify such biases.
arXiv Detail & Related papers (2025-04-28T09:28:25Z)
Conformal uncertainty quantification to evaluate predictive fairness of foundation AI model for skin lesion classes across patient demographics [8.692647930497936]
We use conformal analysis to quantify the predictive uncertainty of a vision transformer based foundation model. We show how this can be used as a fairness metric to evaluate the robustness of the feature embeddings of the foundation model.
arXiv Detail & Related papers (2025-03-31T08:06:00Z)
Detecting Dataset Bias in Medical AI: A Generalized and Modality-Agnostic Auditing Framework [8.520644988801243]
latent bias in machine learning datasets can be amplified during training and/or hidden during testing. We present a data modality-agnostic auditing framework for generating targeted hypotheses about sources of bias. We demonstrate the broad applicability and value of our method by analyzing large-scale medical datasets.
arXiv Detail & Related papers (2025-03-13T02:16:48Z)
Unmasking Bias in AI: A Systematic Review of Bias Detection and Mitigation Strategies in Electronic Health Record-based Models [6.300835344100545]
Leveraging artificial intelligence in conjunction with electronic health records holds transformative potential to improve healthcare. Yet, addressing bias in AI, which risks worsening healthcare disparities, cannot be overlooked. This study reviews methods to detect and mitigate diverse forms of bias in AI models developed using EHR data.
arXiv Detail & Related papers (2023-10-30T18:29:15Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
Tree-Guided Rare Feature Selection and Logic Aggregation with Electronic Health Records Data [7.422597776308963]
We propose a tree-guided feature selection and logic aggregation approach for large-scale regression with rare binary features. In a suicide risk study with EHR data, our approach is able to select and aggregate prior mental health diagnoses.
arXiv Detail & Related papers (2022-06-18T03:52:43Z)
Evaluation of data imputation strategies in complex, deeply-phenotyped data sets: the case of the EU-AIMS Longitudinal European Autism Project [0.0]
We evaluate different imputation strategies to fill in missing values in clinical data from a large (total N=764) dataset. We consider a total of 160 clinical measures divided in 15 overlapping subsets of participants.
arXiv Detail & Related papers (2022-01-20T21:50:38Z)
TRAPDOOR: Repurposing backdoors to detect dataset bias in machine learning-based genomic analysis [15.483078145498085]
Under-representation of groups in datasets can lead to inaccurate predictions for certain groups, which can exacerbate systemic discrimination issues. We propose TRAPDOOR, a methodology for identification of biased datasets by repurposing a technique that has been mostly proposed for nefarious purposes: Neural network backdoors. Using a real-world cancer dataset, we analyze the dataset with the bias that already existed towards white individuals and also introduced biases in datasets artificially.
arXiv Detail & Related papers (2021-08-14T17:02:02Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system. Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model. We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z)
Adversarial Sample Enhanced Domain Adaptation: A Case Study on Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation. adversarially generated samples are used during domain adaptation. Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z)
Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification. It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units. The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.