Evaluating the Impact of Pulse Oximetry Bias in Machine Learning under Counterfactual Thinking
- URL: http://arxiv.org/abs/2408.04396v1
- Date: Thu, 8 Aug 2024 12:03:03 GMT
- Title: Evaluating the Impact of Pulse Oximetry Bias in Machine Learning under Counterfactual Thinking
- Authors: Inês Martins, João Matos, Tiago Gonçalves, Leo A. Celi, A. Ian Wong, Jaime S. Cardoso,
- Abstract summary: This study addresses the technical challenges of quantifying the impact of medical device bias in machine learning models.
Our experiments compare a "perfect world", without pulse oximetry bias, using SaO2 (blood-gas) to the "actual world", with biased measurements, using SpO2 (pulse oximetry)
Patients with overestimation of O2 by pulse oximetry of > 3% had significant decreases in mortality prediction recall.
- Score: 3.7415756194561753
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Algorithmic bias in healthcare mirrors existing data biases. However, the factors driving unfairness are not always known. Medical devices capture significant amounts of data but are prone to errors; for instance, pulse oximeters overestimate the arterial oxygen saturation of darker-skinned individuals, leading to worse outcomes. The impact of this bias in machine learning (ML) models remains unclear. This study addresses the technical challenges of quantifying the impact of medical device bias in downstream ML. Our experiments compare a "perfect world", without pulse oximetry bias, using SaO2 (blood-gas), to the "actual world", with biased measurements, using SpO2 (pulse oximetry). Under this counterfactual design, two models are trained with identical data, features, and settings, except for the method of measuring oxygen saturation: models using SaO2 are a "control" and models using SpO2 a "treatment". The blood-gas oximetry linked dataset was a suitable test-bed, containing 163,396 nearly-simultaneous SpO2 - SaO2 paired measurements, aligned with a wide array of clinical features and outcomes. We studied three classification tasks: in-hospital mortality, respiratory SOFA score in the next 24 hours, and SOFA score increase by two points. Models using SaO2 instead of SpO2 generally showed better performance. Patients with overestimation of O2 by pulse oximetry of > 3% had significant decreases in mortality prediction recall, from 0.63 to 0.59, P < 0.001. This mirrors clinical processes where biased pulse oximetry readings provide clinicians with false reassurance of patients' oxygen levels. A similar degradation happened in ML models, with pulse oximetry biases leading to more false negatives in predicting adverse outcomes.
Related papers
- Causal Inference with Double/Debiased Machine Learning for Evaluating the Health Effects of Multiple Mismeasured Pollutants [9.545421693714768]
This paper addresses estimation and inference for the causal effect of one constituent in the presence of other PM2.5 constituents.
We demonstrated that the proposed estimator with regression calibration is consistent and derived its variance.
We applied this method to assess causal effects of PM2.5 constituents on cognitive function in the Nurses' Health Study.
arXiv Detail & Related papers (2024-09-22T01:48:56Z) - Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation [1.4149417323913716]
We study pretrained audio neural networks (CNN6, CNN10 and CNN14) and the Masked Autoencoder (Audio-MAE) for RI detection.
For the regression task of estimating SpO2 levels, the models achieve root mean square error values exceeding the accepted clinical range of 3.5% for finger oximeters.
We transform SpO2-regression into a SpO2-threshold binary classification problem, with a threshold of 92%.
arXiv Detail & Related papers (2024-07-30T17:26:16Z) - Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis.
Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria.
To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z) - Deep Learning-Based Correction and Unmixing of Hyperspectral Images for
Brain Tumor Surgery [0.0]
We propose two deep learning models for correction and unmixing.
One is trained with protoporphyrin IX (PpIX) concentration labels.
The other undergoes semi-supervised training, first learning hyperspectral unmixing self-supervised and then learning to correct fluorescence emission spectra.
arXiv Detail & Related papers (2024-02-06T07:04:35Z) - How Does Pruning Impact Long-Tailed Multi-Label Medical Image
Classifiers? [49.35105290167996]
Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance.
This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification.
arXiv Detail & Related papers (2023-08-17T20:40:30Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Building Brains: Subvolume Recombination for Data Augmentation in Large
Vessel Occlusion Detection [56.67577446132946]
A large training data set is required for a standard deep learning-based model to learn this strategy from data.
We propose an augmentation method that generates artificial training samples by recombining vessel tree segmentations of the hemispheres from different patients.
In line with the augmentation scheme, we use a 3D-DenseNet fed with task-specific input, fostering a side-by-side comparison between the hemispheres.
arXiv Detail & Related papers (2022-05-05T10:31:57Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - A proof of concept study for machine learning application to stenosis
detection [0.0]
A virtual patient database (VPD) is created using one-dimensional pulse wave propagation model of haemodynamics.
Four different machine learning (ML) methods are used to train and test a series of classifiers.
arXiv Detail & Related papers (2021-02-11T19:39:33Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z) - Robust Modelling of Reflectance Pulse Oximetry for SpO$_2$ Estimation [0.0]
Continuous monitoring of blood oxygen saturation levels is vital for patients with pulmonary disorders.
Traditionally, SpO$$ monitoring has been carried out using transmittance pulse oximeters.
reflectance pulse oximeters can be used at various sites like finger, wrist, chest and forehead.
arXiv Detail & Related papers (2020-04-14T04:53:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.