Related papers: A Scalable Workflow to Build Machine Learning Classifiers with Clinician-in-the-Loop to Identify Patients in Specific Diseases

A Scalable Workflow to Build Machine Learning Classifiers with Clinician-in-the-Loop to Identify Patients in Specific Diseases

URL: http://arxiv.org/abs/2205.08891v1
Date: Wed, 18 May 2022 12:24:07 GMT
Title: A Scalable Workflow to Build Machine Learning Classifiers with Clinician-in-the-Loop to Identify Patients in Specific Diseases
Authors: Jingqing Zhang, Atri Sharma, Luis Bolanos, Tong Li, Ashwani Tanwar, Vibhor Gupta, Yike Guo
Abstract summary: Clinicians may rely on medical coding systems such as International Classification of Diseases (ICD) to identify patients with diseases from Electronic Health Records (EHRs) Recent studies suggest the ICD codes often cannot characterise patients accurately for specific diseases in real clinical practice. This paper proposes a scalable workflow which leverages both structured data and unstructured textual notes from EHRs with techniques including NLP, AutoML and Clinician-in-the-Loop mechanism.
Score: 10.658425378457363
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clinicians may rely on medical coding systems such as International Classification of Diseases (ICD) to identify patients with diseases from Electronic Health Records (EHRs). However, due to the lack of detail and specificity as well as a probability of miscoding, recent studies suggest the ICD codes often cannot characterise patients accurately for specific diseases in real clinical practice, and as a result, using them to find patients for studies or trials can result in high failure rates and missing out on uncoded patients. Manual inspection of all patients at scale is not feasible as it is highly costly and slow. This paper proposes a scalable workflow which leverages both structured data and unstructured textual notes from EHRs with techniques including NLP, AutoML and Clinician-in-the-Loop mechanism to build machine learning classifiers to identify patients at scale with given diseases, especially those who might currently be miscoded or missed by ICD codes. Case studies in the MIMIC-III dataset were conducted where the proposed workflow demonstrates a higher classification performance in terms of F1 scores compared to simply using ICD codes on gold testing subset to identify patients with Ovarian Cancer (0.901 vs 0.814), Lung Cancer (0.859 vs 0.828), Cancer Cachexia (0.862 vs 0.650), and Lupus Nephritis (0.959 vs 0.855). Also, the proposed workflow that leverages unstructured notes consistently outperforms the baseline that uses structured data only with an increase of F1 (Ovarian Cancer 0.901 vs 0.719, Lung Cancer 0.859 vs 0.787, Cancer Cachexia 0.862 vs 0.838 and Lupus Nephritis 0.959 vs 0.785). Experiments on the large testing set also demonstrate the proposed workflow can find more patients who are miscoded or missed by ICD codes. Moreover, interpretability studies are also conducted to clinically validate the top impact features of the classifiers.

Related papers

ECTIL: Label-efficient Computational Tumour Infiltrating Lymphocyte (TIL) assessment in breast cancer: Multicentre validation in 2,340 patients with breast cancer [17.91294880294883]
Level of tumour-infiltrating lymphocytes (TILs) is a prognostic factor for patients with (triple-negative) breast cancer. Current Computational TIL assessment (CTA) models rely heavily on many detailed annotations. We propose a fundamentally simpler deep learning based model that can be trained in only ten minutes on hundredfold fewer pathologist annotations.
arXiv Detail & Related papers (2025-01-24T10:28:05Z)
Optimizing Mortality Prediction for ICU Heart Failure Patients: Leveraging XGBoost and Advanced Machine Learning with the MIMIC-III Database [1.5186937600119894]
Heart failure affects millions of people worldwide, significantly reducing quality of life and leading to high mortality rates. Despite extensive research, the relationship between heart failure and mortality rates among ICU patients is not fully understood. This study analyzed data from 1,177 patients over 18 years old from the MIMIC-III database, identified using ICD-9 codes.
arXiv Detail & Related papers (2024-09-03T07:57:08Z)
A Federated Learning Framework for Stenosis Detection [70.27581181445329]
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA) Two heterogeneous datasets from two institutions were considered: dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy) dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature.
arXiv Detail & Related papers (2023-10-30T11:13:40Z)
Learning to diagnose cirrhosis from radiological and histological labels with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset. We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis. This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z)
Learning to diagnose common thorax diseases on chest radiographs from radiology reports in Vietnamese [0.33598755777055367]
We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images. This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country.
arXiv Detail & Related papers (2022-09-11T06:06:03Z)
Self-supervised contrastive learning of echocardiogram videos enables label-efficient cardiac disease diagnosis [48.64462717254158]
We developed a self-supervised contrastive learning approach, EchoCLR, to catered to echocardiogram videos. When fine-tuned on small portions of labeled data, EchoCLR pretraining significantly improved classification performance for left ventricular hypertrophy (LVH) and aortic stenosis (AS) EchoCLR is unique in its ability to learn representations of medical videos and demonstrates that SSL can enable label-efficient disease classification from small, labeled datasets.
arXiv Detail & Related papers (2022-07-23T19:17:26Z)
A Deep Learning Based Workflow for Detection of Lung Nodules With Chest Radiograph [0.0]
We built a segmentation model to identify lung areas from CXRs, and sliced them into 16 patches. These labeled patches were then used to train finetune a deep neural network(DNN) model, classifying the patches as positive or negative.
arXiv Detail & Related papers (2021-12-19T16:19:46Z)
Deep learning-based COVID-19 pneumonia classification using chest CT images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries. We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split. The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z)
Collaborative residual learners for automatic icd10 prediction using prescribed medications [45.82374977939355]
We propose a novel collaborative residual learning based model to automatically predict ICD10 codes employing only prescriptions data. We obtain multi-label classification accuracy of 0.71 and 0.57 of average precision, 0.57 and 0.38 of F1-score and 0.73 and 0.44 of accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:07:27Z)
Ensemble model for pre-discharge icd10 coding prediction [45.82374977939355]
We propose an ensemble model incorporating multiple clinical data sources for accurate code predictions. We obtain multi-label classification accuracies of 0.73 and 0.58 for average precision, 0.56 and 0.35 for F1-scores and 0.71 and 0.4 accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:02:56Z)
Deep Learning Applied to Chest X-Rays: Exploiting and Preventing Shortcuts [11.511323714777298]
This paper studies the case of spurious class skew in which patients with a particular attribute are spuriously more likely to have the outcome of interest. We show that deep nets can accurately identify many patient attributes including sex (AUROC = 0.96) and age (AUROC >= 0.90) when learning to predict a diagnosis. A simple transfer learning approach is surprisingly effective at preventing the shortcut and promoting good performance.
arXiv Detail & Related papers (2020-09-21T18:52:43Z)
CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic. The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands. We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.