An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in
Healthcare Datasets
- URL: http://arxiv.org/abs/2311.03425v1
- Date: Mon, 6 Nov 2023 17:08:41 GMT
- Title: An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in
Healthcare Datasets
- Authors: Faris F. Gulamali, Ashwin S. Sawant, Lora Liharska, Carol R. Horowitz,
Lili Chan, Patricia H. Kovatch, Ira Hofer, Karandeep Singh, Lynne D.
Richardson, Emmanuel Mensah, Alexander W Charney, David L. Reich, Jianying
Hu, Girish N. Nadkarni
- Abstract summary: We generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity)
We then apply a systematic analysis of AEq values across subpopulations to identify and manifestations of racial bias in two known cases in healthcare.
AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
- Score: 32.25265709333831
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The adoption of diagnosis and prognostic algorithms in healthcare has led to
concerns about the perpetuation of bias against disadvantaged groups of
individuals. Deep learning methods to detect and mitigate bias have revolved
around modifying models, optimization strategies, and threshold calibration
with varying levels of success. Here, we generate a data-centric,
model-agnostic, task-agnostic approach to evaluate dataset bias by
investigating the relationship between how easily different groups are learned
at small sample sizes (AEquity). We then apply a systematic analysis of AEq
values across subpopulations to identify and mitigate manifestations of racial
bias in two known cases in healthcare - Chest X-rays diagnosis with deep
convolutional neural networks and healthcare utilization prediction with
multivariate logistic regression. AEq is a novel and broadly applicable metric
that can be applied to advance equity by diagnosing and remediating bias in
healthcare datasets.
Related papers
- Unmasking Bias in AI: A Systematic Review of Bias Detection and Mitigation Strategies in Electronic Health Record-based Models [6.300835344100545]
Leveraging artificial intelligence in conjunction with electronic health records holds transformative potential to improve healthcare.
Yet, addressing bias in AI, which risks worsening healthcare disparities, cannot be overlooked.
This study reviews methods to detect and mitigate diverse forms of bias in AI models developed using EHR data.
arXiv Detail & Related papers (2023-10-30T18:29:15Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Tree-Guided Rare Feature Selection and Logic Aggregation with Electronic
Health Records Data [7.422597776308963]
We propose a tree-guided feature selection and logic aggregation approach for large-scale regression with rare binary features.
In a suicide risk study with EHR data, our approach is able to select and aggregate prior mental health diagnoses.
arXiv Detail & Related papers (2022-06-18T03:52:43Z) - Evaluation of data imputation strategies in complex, deeply-phenotyped
data sets: the case of the EU-AIMS Longitudinal European Autism Project [0.0]
We evaluate different imputation strategies to fill in missing values in clinical data from a large (total N=764) dataset.
We consider a total of 160 clinical measures divided in 15 overlapping subsets of participants.
arXiv Detail & Related papers (2022-01-20T21:50:38Z) - TRAPDOOR: Repurposing backdoors to detect dataset bias in machine
learning-based genomic analysis [15.483078145498085]
Under-representation of groups in datasets can lead to inaccurate predictions for certain groups, which can exacerbate systemic discrimination issues.
We propose TRAPDOOR, a methodology for identification of biased datasets by repurposing a technique that has been mostly proposed for nefarious purposes: Neural network backdoors.
Using a real-world cancer dataset, we analyze the dataset with the bias that already existed towards white individuals and also introduced biases in datasets artificially.
arXiv Detail & Related papers (2021-08-14T17:02:02Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.