Related papers: Disparate Censorship & Undertesting: A Source of Label Bias in Clinical Machine Learning

Disparate Censorship & Undertesting: A Source of Label Bias in Clinical Machine Learning

URL: http://arxiv.org/abs/2208.01127v1
Date: Mon, 1 Aug 2022 20:15:31 GMT
Title: Disparate Censorship & Undertesting: A Source of Label Bias in Clinical Machine Learning
Authors: Trenton Chang, Michael W. Sjoding, Jenna Wiens
Abstract summary: Disparate censorship in patients of equivalent risk leads to undertesting in certain groups, and in turn, more biased labels for such groups. Our findings call attention to disparate censorship as a source of label bias in clinical ML models.
Score: 14.133370438685969
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As machine learning (ML) models gain traction in clinical applications, understanding the impact of clinician and societal biases on ML models is increasingly important. While biases can arise in the labels used for model training, the many sources from which these biases arise are not yet well-studied. In this paper, we highlight disparate censorship (i.e., differences in testing rates across patient groups) as a source of label bias that clinical ML models may amplify, potentially causing harm. Many patient risk-stratification models are trained using the results of clinician-ordered diagnostic and laboratory tests of labels. Patients without test results are often assigned a negative label, which assumes that untested patients do not experience the outcome. Since orders are affected by clinical and resource considerations, testing may not be uniform in patient populations, giving rise to disparate censorship. Disparate censorship in patients of equivalent risk leads to undertesting in certain groups, and in turn, more biased labels for such groups. Using such biased labels in standard ML pipelines could contribute to gaps in model performance across patient groups. Here, we theoretically and empirically characterize conditions in which disparate censorship or undertesting affect model performance across subgroups. Our findings call attention to disparate censorship as a source of label bias in clinical ML models.

Related papers

An analysis of data variation and bias in image-based dermatological datasets for machine learning classification [2.039829968340841]
In clinical dermatology, classification models can detect malignant lesions on patients' skin using only RGB images as input. Most learning-based methods employ data acquired from dermoscopic datasets on training, which are large and validated by a gold standard. This work aims to evaluate the gap between dermoscopic and clinical samples and understand how the dataset variations impact training.
arXiv Detail & Related papers (2025-01-15T17:18:46Z)
Debias-CLR: A Contrastive Learning Based Debiasing Method for Algorithmic Fairness in Healthcare Applications [0.17624347338410748]
We proposed an implicit in-processing debiasing method to combat disparate treatment. We used clinical notes of heart failure patients and used diagnostic codes, procedure reports and physiological vitals of the patients. We found that Debias-CLR was able to reduce the Single-Category Word Embedding Association Test (SC-WEAT) effect size score when debiasing for gender and ethnicity.
arXiv Detail & Related papers (2024-11-15T19:32:01Z)
From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions [9.440055827786596]
We study a clinically-inspired selective label problem called disparate censorship. Disparate Censorship Expectation-Maximization (DCEM) is an algorithm for learning in the presence of such censorship.
arXiv Detail & Related papers (2024-06-27T03:33:38Z)
How Does Pruning Impact Long-Tailed Multi-Label Medical Image Classifiers? [49.35105290167996]
Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance. This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification.
arXiv Detail & Related papers (2023-08-17T20:40:30Z)
Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection [3.3944964838781093]
We describe three classes of label selection and simulate five causally distinct scenarios to assess how particular selection mechanisms bias a suite of commonly reported binary machine learning model performance metrics. We find that naive estimates of AUROC on the observed population undershoot actual performance by up to 20%. Such a disparity could be large enough to lead to the wrongful termination of a successful clinical decision support tool.
arXiv Detail & Related papers (2022-09-15T22:30:14Z)
Write It Like You See It: Detectable Differences in Clinical Notes By Race Lead To Differential Model Recommendations [15.535251319178379]
We investigate the level of implicit race information available to machine learning models and human experts. We find that models can identify patient self-reported race from clinical notes even when the notes are stripped of explicit indicators of race. We show that models trained on these race-redacted clinical notes can still perpetuate existing biases in clinical treatment decisions.
arXiv Detail & Related papers (2022-05-08T18:24:11Z)
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models [69.09570726777817]
We introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input. We show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns.
arXiv Detail & Related papers (2021-11-30T15:52:04Z)
Algorithmic encoding of protected characteristics and its implications on disparities across subgroups [17.415882865534638]
Machine learning models may pick up undesirable correlations between a patient's racial identity and clinical outcome. Very little is known about how these biases are encoded and how one may reduce or even remove disparate performance.
arXiv Detail & Related papers (2021-10-27T20:30:57Z)
LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. We propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
Hemogram Data as a Tool for Decision-making in COVID-19 Management: Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure. This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients. Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.