An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in
  Healthcare Datasets
        - URL: http://arxiv.org/abs/2311.03425v1
- Date: Mon, 6 Nov 2023 17:08:41 GMT
- Title: An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in
  Healthcare Datasets
- Authors: Faris F. Gulamali, Ashwin S. Sawant, Lora Liharska, Carol R. Horowitz,
  Lili Chan, Patricia H. Kovatch, Ira Hofer, Karandeep Singh, Lynne D.
  Richardson, Emmanuel Mensah, Alexander W Charney, David L. Reich, Jianying
  Hu, Girish N. Nadkarni
- Abstract summary: We generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity)
We then apply a systematic analysis of AEq values across subpopulations to identify and manifestations of racial bias in two known cases in healthcare.
AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
- Score: 32.25265709333831
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract:   The adoption of diagnosis and prognostic algorithms in healthcare has led to
concerns about the perpetuation of bias against disadvantaged groups of
individuals. Deep learning methods to detect and mitigate bias have revolved
around modifying models, optimization strategies, and threshold calibration
with varying levels of success. Here, we generate a data-centric,
model-agnostic, task-agnostic approach to evaluate dataset bias by
investigating the relationship between how easily different groups are learned
at small sample sizes (AEquity). We then apply a systematic analysis of AEq
values across subpopulations to identify and mitigate manifestations of racial
bias in two known cases in healthcare - Chest X-rays diagnosis with deep
convolutional neural networks and healthcare utilization prediction with
multivariate logistic regression. AEq is a novel and broadly applicable metric
that can be applied to advance equity by diagnosing and remediating bias in
healthcare datasets.
 
      
        Related papers
        - AI Alignment in Medical Imaging: Unveiling Hidden Biases Through   Counterfactual Analysis [16.21270312974956]
 We introduce a novel statistical framework to evaluate the dependency of medical imaging ML models on sensitive attributes, such as demographics.
We present a practical algorithm that combines conditional latent diffusion models with statistical hypothesis testing to identify and quantify such biases.
 arXiv  Detail & Related papers  (2025-04-28T09:28:25Z)
- Conformal uncertainty quantification to evaluate predictive fairness of   foundation AI model for skin lesion classes across patient demographics [8.692647930497936]
 We use conformal analysis to quantify the predictive uncertainty of a vision transformer based foundation model.
We show how this can be used as a fairness metric to evaluate the robustness of the feature embeddings of the foundation model.
 arXiv  Detail & Related papers  (2025-03-31T08:06:00Z)
- Detecting Dataset Bias in Medical AI: A Generalized and   Modality-Agnostic Auditing Framework [8.520644988801243]
 latent bias in machine learning datasets can be amplified during training and/or hidden during testing.
We present a data modality-agnostic auditing framework for generating targeted hypotheses about sources of bias.
We demonstrate the broad applicability and value of our method by analyzing large-scale medical datasets.
 arXiv  Detail & Related papers  (2025-03-13T02:16:48Z)
- Unmasking Bias in AI: A Systematic Review of Bias Detection and   Mitigation Strategies in Electronic Health Record-based Models [6.300835344100545]
 Leveraging artificial intelligence in conjunction with electronic health records holds transformative potential to improve healthcare.
Yet, addressing bias in AI, which risks worsening healthcare disparities, cannot be overlooked.
This study reviews methods to detect and mitigate diverse forms of bias in AI models developed using EHR data.
 arXiv  Detail & Related papers  (2023-10-30T18:29:15Z)
- D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
  Algorithmic Bias [57.87117733071416]
 We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
 arXiv  Detail & Related papers  (2022-08-10T03:41:48Z)
- Tree-Guided Rare Feature Selection and Logic Aggregation with Electronic
  Health Records Data [7.422597776308963]
 We propose a tree-guided feature selection and logic aggregation approach for large-scale regression with rare binary features.
In a suicide risk study with EHR data, our approach is able to select and aggregate prior mental health diagnoses.
 arXiv  Detail & Related papers  (2022-06-18T03:52:43Z)
- Evaluation of data imputation strategies in complex, deeply-phenotyped
  data sets: the case of the EU-AIMS Longitudinal European Autism Project [0.0]
 We evaluate different imputation strategies to fill in missing values in clinical data from a large (total N=764) dataset.
We consider a total of 160 clinical measures divided in 15 overlapping subsets of participants.
 arXiv  Detail & Related papers  (2022-01-20T21:50:38Z)
- TRAPDOOR: Repurposing backdoors to detect dataset bias in machine
  learning-based genomic analysis [15.483078145498085]
 Under-representation of groups in datasets can lead to inaccurate predictions for certain groups, which can exacerbate systemic discrimination issues.
We propose TRAPDOOR, a methodology for identification of biased datasets by repurposing a technique that has been mostly proposed for nefarious purposes: Neural network backdoors.
Using a real-world cancer dataset, we analyze the dataset with the bias that already existed towards white individuals and also introduced biases in datasets artificially.
 arXiv  Detail & Related papers  (2021-08-14T17:02:02Z)
- Bootstrapping Your Own Positive Sample: Contrastive Learning With
  Electronic Health Record Data [62.29031007761901]
 This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
 arXiv  Detail & Related papers  (2021-04-07T06:02:04Z)
- Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
 We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
 arXiv  Detail & Related papers  (2021-03-07T03:10:32Z)
- Adversarial Sample Enhanced Domain Adaptation: A Case Study on
  Predictive Modeling with Electronic Health Records [57.75125067744978]
 We propose a data augmentation method to facilitate domain adaptation.
 adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
 arXiv  Detail & Related papers  (2021-01-13T03:20:20Z)
- Semi-supervised Medical Image Classification with Relation-driven
  Self-ensembling Model [71.80319052891817]
 We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
 arXiv  Detail & Related papers  (2020-05-15T06:57:54Z)
- Predictive Modeling of ICU Healthcare-Associated Infections from
  Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
  Approach [55.41644538483948]
 This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
 arXiv  Detail & Related papers  (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.