Explaining medical AI performance disparities across sites with
confounder Shapley value analysis
- URL: http://arxiv.org/abs/2111.08168v1
- Date: Fri, 12 Nov 2021 18:54:10 GMT
- Title: Explaining medical AI performance disparities across sites with
confounder Shapley value analysis
- Authors: Eric Wu, Kevin Wu, James Zou
- Abstract summary: Multi-site evaluations are key to diagnosing such disparities.
Our framework provides a method for quantifying the marginal and cumulative effect of each type of bias on the overall performance difference.
We demonstrate its usefulness in a case study of a deep learning model trained to detect the presence of pneumothorax.
- Score: 8.785345834486057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical AI algorithms can often experience degraded performance when
evaluated on previously unseen sites. Addressing cross-site performance
disparities is key to ensuring that AI is equitable and effective when deployed
on diverse patient populations. Multi-site evaluations are key to diagnosing
such disparities as they can test algorithms across a broader range of
potential biases such as patient demographics, equipment types, and technical
parameters. However, such tests do not explain why the model performs worse.
Our framework provides a method for quantifying the marginal and cumulative
effect of each type of bias on the overall performance difference when a model
is evaluated on external data. We demonstrate its usefulness in a case study of
a deep learning model trained to detect the presence of pneumothorax, where our
framework can help explain up to 60% of the discrepancy in performance across
different sites with known biases like disease comorbidities and imaging
parameters.
Related papers
- Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods [5.274804664403783]
We use Slice Discovery Methods to identify interpretable underperforming subsets of data and hypotheses regarding the cause of observed performance disparities.
Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients.
arXiv Detail & Related papers (2024-06-17T23:08:46Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Unbiased Pain Assessment through Wearables and EHR Data: Multi-attribute
Fairness Loss-based CNN Approach [3.799109312082668]
We propose a Multi-attribute Fairness Loss (MAFL) based CNN model to account for any sensitive attributes included in the data.
We compare the proposed model with well-known existing mitigation procedures, and studies reveal that the implemented model performs favorably in contrast to state-of-the-art methods.
arXiv Detail & Related papers (2023-07-03T09:21:36Z) - AI in the Loop -- Functionalizing Fold Performance Disagreement to
Monitor Automated Medical Image Segmentation Pipelines [0.0]
Methods for automatically flag poor performing-predictions are essential for safely implementing machine learning into clinical practice.
We present a readily adoptable method using sub-models trained on different dataset folds, where their disagreement serves as a surrogate for model confidence.
arXiv Detail & Related papers (2023-05-15T21:35:23Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - A method for comparing multiple imputation techniques: a case study on
the U.S. National COVID Cohort Collaborative [1.259457977936316]
We numerically evaluate strategies for handling missing data in the context of statistical analysis.
Our approach could effectively highlight the most valid and performant missing-data handling strategy.
arXiv Detail & Related papers (2022-06-13T19:49:54Z) - Analyzing the Effects of Handling Data Imbalance on Learned Features
from Medical Images by Looking Into the Models [50.537859423741644]
Training a model on an imbalanced dataset can introduce unique challenges to the learning problem.
We look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features.
arXiv Detail & Related papers (2022-04-04T09:38:38Z) - What Do You See in this Patient? Behavioral Testing of Clinical NLP
Models [69.09570726777817]
We introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input.
We show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns.
arXiv Detail & Related papers (2021-11-30T15:52:04Z) - Algorithmic encoding of protected characteristics and its implications
on disparities across subgroups [17.415882865534638]
Machine learning models may pick up undesirable correlations between a patient's racial identity and clinical outcome.
Very little is known about how these biases are encoded and how one may reduce or even remove disparate performance.
arXiv Detail & Related papers (2021-10-27T20:30:57Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.