MIMIC-IV-Ext-PE: Using a large language model to predict pulmonary embolism phenotype in the MIMIC-IV dataset
- URL: http://arxiv.org/abs/2411.00044v1
- Date: Tue, 29 Oct 2024 19:28:44 GMT
- Title: MIMIC-IV-Ext-PE: Using a large language model to predict pulmonary embolism phenotype in the MIMIC-IV dataset
- Authors: B. D. Lam, S. Ma, I. Kovalenko, P. Wang, O. Jafari, A. Li, S. Horng,
- Abstract summary: Pulmonary embolism is a leading cause of preventable in-hospital mortality.
There are few large publicly available datasets that contain PE labels for research.
We extracted all available radiology reports ofCTPA scans and two physicians manually labeled the results as PE positive (acute PE) or PE negative.
We applied a previously finetuned Bio_ClinicalBERT transformer language model, VTE-BERT, to extract labels automatically.
- Score: 0.0
- License:
- Abstract: Pulmonary embolism (PE) is a leading cause of preventable in-hospital mortality. Advances in diagnosis, risk stratification, and prevention can improve outcomes. There are few large publicly available datasets that contain PE labels for research. Using the MIMIC-IV database, we extracted all available radiology reports of computed tomography pulmonary angiography (CTPA) scans and two physicians manually labeled the results as PE positive (acute PE) or PE negative. We then applied a previously finetuned Bio_ClinicalBERT transformer language model, VTE-BERT, to extract labels automatically. We verified VTE-BERT's reliability by measuring its performance against manual adjudication. We also compared the performance of VTE-BERT to diagnosis codes. We found that VTE-BERT has a sensitivity of 92.4% and positive predictive value (PPV) of 87.8% on all 19,942 patients with CTPA radiology reports from the emergency room and/or hospital admission. In contrast, diagnosis codes have a sensitivity of 95.4% and PPV of 83.8% on the subset of 11,990 hospitalized patients with discharge diagnosis codes. We successfully add nearly 20,000 labels to CTPAs in a publicly available dataset and demonstrate the external validity of a semi-supervised language model in accelerating hematologic research.
Related papers
- Diagnosis of Covid-19 Via Patient Breath Data Using Artificial
Intelligence [0.0]
This study aims to develop a point-of-care testing (POCT) system that can detect COVID-19 by detecting volatile organic compounds (VOCs) in a patient's exhaled breath.
294 breath samples were collected from 142 patients at Istanbul Medipol Mega Hospital between December 2020 and March 2021.
The Gradient Boosting algorithm provides 95% recall when predicting COVID-19 positive patients and 96% accuracy when predicting COVID-19 negative patients.
arXiv Detail & Related papers (2023-01-24T22:00:00Z) - A Novel Implementation of Machine Learning for the Efficient,
Explainable Diagnosis of COVID-19 from Chest CT [0.0]
The aim of this study was to take a novel approach in the machine learning-based detection of COVID-19 from chest CT scans.
The proposed model attained an overall accuracy of 0.927 and a sensitivity of 0.958.
arXiv Detail & Related papers (2022-06-15T18:35:22Z) - Dual-Attention Residual Network for Automatic Diagnosis of COVID-19 [6.941255691176647]
We propose a novel residual network to automatically identify COVID-19 from other common pneumonia and normal people using CT images.
Our method can differentiate COVID-19 from the other two classes with 94.7% accuracy, 93.73% sensitivity, 98.28% specificity, 95.26% F1-score, and an area under the receiver operating characteristic curve (AUC) of 0.99.
arXiv Detail & Related papers (2021-05-14T11:59:47Z) - Quantification of pulmonary involvement in COVID-19 pneumonia by means
of a cascade oftwo U-nets: training and assessment on multipledatasets using
different annotation criteria [83.83783947027392]
This study aims at exploiting Artificial intelligence (AI) for the identification, segmentation and quantification of COVID-19 pulmonary lesions.
We developed an automated analysis pipeline, the LungQuant system, based on a cascade of two U-nets.
The accuracy in predicting the CT-Severity Score (CT-SS) of the LungQuant system has been also evaluated.
arXiv Detail & Related papers (2021-05-06T10:21:28Z) - An Explainable AI System for Automated COVID-19 Assessment and Lesion
Categorization from CT-scans [8.694504007704994]
COVID-19 infection caused by SARS-CoV-2 pathogen is a catastrophic pandemic outbreak all over the world.
We propose an AI-powered pipeline, based on the deep-learning paradigm, for automated COVID-19 detection and lesion categorization from CT scans.
arXiv Detail & Related papers (2021-01-28T11:47:35Z) - Identification of Ischemic Heart Disease by using machine learning
technique based on parameters measuring Heart Rate Variability [50.591267188664666]
In this study, 18 non-invasive features (age, gender, left ventricular ejection fraction and 15 obtained from HRV) of 243 subjects were used to train and validate a series of several ANN.
The best result was obtained using 7 input parameters and 7 hidden nodes with an accuracy of 98.9% and 82% for the training and validation dataset.
arXiv Detail & Related papers (2020-10-29T19:14:41Z) - Integrative Analysis for COVID-19 Patient Outcome Prediction [53.11258640541513]
We combine radiomics of lung opacities and non-imaging features from demographic data, vital signs, and laboratory findings to predict need for intensive care unit admission.
Our methods may also be applied to other lung diseases including but not limited to community acquired pneumonia.
arXiv Detail & Related papers (2020-07-20T19:08:50Z) - Predicting COVID-19 Pneumonia Severity on Chest X-ray with Deep Learning [57.00601760750389]
We present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images.
Such a tool can gauge severity of COVID-19 lung infections that can be used for escalation or de-escalation of care.
arXiv Detail & Related papers (2020-05-24T23:13:16Z) - 3D Tomographic Pattern Synthesis for Enhancing the Quantification of
COVID-19 [13.424414148963566]
Coronavirus Disease (COVID-19) has affected 1.8 million people and resulted in more than 110,000 deaths as of April 12, 2020.
Tomographic patterns seen on chest Computed Tomography (CT), such as ground-glass opacities, consolidations, and crazy paving pattern, are correlated with the disease severity and progression.
We propose to use synthetic datasets to augment an existing COVID-19 database to tackle these challenges.
arXiv Detail & Related papers (2020-05-05T01:31:40Z) - JCS: An Explainable COVID-19 Diagnosis System by Joint Classification
and Segmentation [95.57532063232198]
coronavirus disease 2019 (COVID-19) has caused a pandemic disease in over 200 countries.
To control the infection, identifying and separating the infected people is the most crucial step.
This paper develops a novel Joint Classification and (JCS) system to perform real-time and explainable COVID-19 chest CT diagnosis.
arXiv Detail & Related papers (2020-04-15T12:30:40Z) - Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale
Chest Computed Tomography Volumes [64.21642241351857]
We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients.
We developed a rule-based method for automatically extracting abnormality labels from free-text radiology reports.
We also developed a model for multi-organ, multi-disease classification of chest CT volumes.
arXiv Detail & Related papers (2020-02-12T00:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.