A Scalable Workflow to Build Machine Learning Classifiers with
Clinician-in-the-Loop to Identify Patients in Specific Diseases
- URL: http://arxiv.org/abs/2205.08891v1
- Date: Wed, 18 May 2022 12:24:07 GMT
- Title: A Scalable Workflow to Build Machine Learning Classifiers with
Clinician-in-the-Loop to Identify Patients in Specific Diseases
- Authors: Jingqing Zhang, Atri Sharma, Luis Bolanos, Tong Li, Ashwani Tanwar,
Vibhor Gupta, Yike Guo
- Abstract summary: Clinicians may rely on medical coding systems such as International Classification of Diseases (ICD) to identify patients with diseases from Electronic Health Records (EHRs)
Recent studies suggest the ICD codes often cannot characterise patients accurately for specific diseases in real clinical practice.
This paper proposes a scalable workflow which leverages both structured data and unstructured textual notes from EHRs with techniques including NLP, AutoML and Clinician-in-the-Loop mechanism.
- Score: 10.658425378457363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clinicians may rely on medical coding systems such as International
Classification of Diseases (ICD) to identify patients with diseases from
Electronic Health Records (EHRs). However, due to the lack of detail and
specificity as well as a probability of miscoding, recent studies suggest the
ICD codes often cannot characterise patients accurately for specific diseases
in real clinical practice, and as a result, using them to find patients for
studies or trials can result in high failure rates and missing out on uncoded
patients. Manual inspection of all patients at scale is not feasible as it is
highly costly and slow.
This paper proposes a scalable workflow which leverages both structured data
and unstructured textual notes from EHRs with techniques including NLP, AutoML
and Clinician-in-the-Loop mechanism to build machine learning classifiers to
identify patients at scale with given diseases, especially those who might
currently be miscoded or missed by ICD codes.
Case studies in the MIMIC-III dataset were conducted where the proposed
workflow demonstrates a higher classification performance in terms of F1 scores
compared to simply using ICD codes on gold testing subset to identify patients
with Ovarian Cancer (0.901 vs 0.814), Lung Cancer (0.859 vs 0.828), Cancer
Cachexia (0.862 vs 0.650), and Lupus Nephritis (0.959 vs 0.855). Also, the
proposed workflow that leverages unstructured notes consistently outperforms
the baseline that uses structured data only with an increase of F1 (Ovarian
Cancer 0.901 vs 0.719, Lung Cancer 0.859 vs 0.787, Cancer Cachexia 0.862 vs
0.838 and Lupus Nephritis 0.959 vs 0.785). Experiments on the large testing set
also demonstrate the proposed workflow can find more patients who are miscoded
or missed by ICD codes. Moreover, interpretability studies are also conducted
to clinically validate the top impact features of the classifiers.
Related papers
- Optimizing Mortality Prediction for ICU Heart Failure Patients: Leveraging XGBoost and Advanced Machine Learning with the MIMIC-III Database [1.5186937600119894]
Heart failure affects millions of people worldwide, significantly reducing quality of life and leading to high mortality rates.
Despite extensive research, the relationship between heart failure and mortality rates among ICU patients is not fully understood.
This study analyzed data from 1,177 patients over 18 years old from the MIMIC-III database, identified using ICD-9 codes.
arXiv Detail & Related papers (2024-09-03T07:57:08Z) - A Federated Learning Framework for Stenosis Detection [70.27581181445329]
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA)
Two heterogeneous datasets from two institutions were considered: dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy)
dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature.
arXiv Detail & Related papers (2023-10-30T11:13:40Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - Learning to diagnose common thorax diseases on chest radiographs from
radiology reports in Vietnamese [0.33598755777055367]
We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images.
This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country.
arXiv Detail & Related papers (2022-09-11T06:06:03Z) - Self-supervised contrastive learning of echocardiogram videos enables
label-efficient cardiac disease diagnosis [48.64462717254158]
We developed a self-supervised contrastive learning approach, EchoCLR, to catered to echocardiogram videos.
When fine-tuned on small portions of labeled data, EchoCLR pretraining significantly improved classification performance for left ventricular hypertrophy (LVH) and aortic stenosis (AS)
EchoCLR is unique in its ability to learn representations of medical videos and demonstrates that SSL can enable label-efficient disease classification from small, labeled datasets.
arXiv Detail & Related papers (2022-07-23T19:17:26Z) - A Deep Learning Based Workflow for Detection of Lung Nodules With Chest
Radiograph [0.0]
We built a segmentation model to identify lung areas from CXRs, and sliced them into 16 patches.
These labeled patches were then used to train finetune a deep neural network(DNN) model, classifying the patches as positive or negative.
arXiv Detail & Related papers (2021-12-19T16:19:46Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - Collaborative residual learners for automatic icd10 prediction using
prescribed medications [45.82374977939355]
We propose a novel collaborative residual learning based model to automatically predict ICD10 codes employing only prescriptions data.
We obtain multi-label classification accuracy of 0.71 and 0.57 of average precision, 0.57 and 0.38 of F1-score and 0.73 and 0.44 of accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:07:27Z) - Deep Learning Applied to Chest X-Rays: Exploiting and Preventing
Shortcuts [11.511323714777298]
This paper studies the case of spurious class skew in which patients with a particular attribute are spuriously more likely to have the outcome of interest.
We show that deep nets can accurately identify many patient attributes including sex (AUROC = 0.96) and age (AUROC >= 0.90) when learning to predict a diagnosis.
A simple transfer learning approach is surprisingly effective at preventing the shortcut and promoting good performance.
arXiv Detail & Related papers (2020-09-21T18:52:43Z) - CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors
and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic.
The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands.
We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.