Natural language processing of MIMIC-III clinical notes for identifying
diagnosis and procedures with neural networks
- URL: http://arxiv.org/abs/1912.12397v1
- Date: Sat, 28 Dec 2019 04:05:15 GMT
- Title: Natural language processing of MIMIC-III clinical notes for identifying
diagnosis and procedures with neural networks
- Authors: Siddhartha Nuthakki, Sunil Neela, Judy W. Gichoya, Saptarshi
Purkayastha
- Abstract summary: We report the performance of a natural language processing model that can map clinical notes to medical codes.
We employed state-of-the-art deep learning method, ULMFiT on the largest emergency department clinical notes dataset MIMIC III.
Our models were able to predict the top-10 diagnoses and procedures with 80.3% and 80.5% accuracy, whereas the top-50 ICD-9 codes of diagnosis and procedures are predicted with 70.7% and 63.9% accuracy.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Coding diagnosis and procedures in medical records is a crucial process in
the healthcare industry, which includes the creation of accurate billings,
receiving reimbursements from payers, and creating standardized patient care
records. In the United States, Billing and Insurance related activities cost
around $471 billion in 2012 which constitutes about 25% of all the U.S hospital
spending. In this paper, we report the performance of a natural language
processing model that can map clinical notes to medical codes, and predict
final diagnosis from unstructured entries of history of present illness,
symptoms at the time of admission, etc. Previous studies have demonstrated that
deep learning models perform better at such mapping when compared to
conventional machine learning models. Therefore, we employed state-of-the-art
deep learning method, ULMFiT on the largest emergency department clinical notes
dataset MIMIC III which has 1.2M clinical notes to select for the top-10 and
top-50 diagnosis and procedure codes. Our models were able to predict the
top-10 diagnoses and procedures with 80.3% and 80.5% accuracy, whereas the
top-50 ICD-9 codes of diagnosis and procedures are predicted with 70.7% and
63.9% accuracy. Prediction of diagnosis and procedures from unstructured
clinical notes benefit human coders to save time, eliminate errors and minimize
costs. With promising scores from our present model, the next step would be to
deploy this on a small-scale real-world scenario and compare it with human
coders as the gold standard. We believe that further research of this approach
can create highly accurate predictions that can ease the workflow in a clinical
setting.
Related papers
- CPLLM: Clinical Prediction with Large Language Models [0.07083082555458872]
We present a method that involves fine-tuning a pre-trained Large Language Model (LLM) for clinical disease and readmission prediction.
For diagnosis prediction, we predict whether patients will be diagnosed with a target disease during their next visit or in the subsequent diagnosis, leveraging their historical diagnosis records.
Our experiments have shown that our proposed method, CPLLM, surpasses all the tested models in terms of PR-AUC and ROC-AUC metrics.
arXiv Detail & Related papers (2023-09-20T13:24:12Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - Foresight -- Deep Generative Modelling of Patient Timelines using
Electronic Health Records [46.024501445093755]
Temporal modelling of medical history can be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications.
We present Foresight, a novel GPT3-based pipeline that uses NER+L tools (i.e. MedCAT) to convert document text into structured, coded concepts.
arXiv Detail & Related papers (2022-12-13T19:06:00Z) - MedML: Fusing Medical Knowledge and Machine Learning Models for Early
Pediatric COVID-19 Hospitalization and Severity Prediction [27.352097332678213]
We respond to the national Pediatric COVID-19 data challenge with a novel machine learning model, MedML.
MedML extracts the most predictive features based on medical knowledge and propensity scores from over 6 million medical concepts.
We evaluate MedML across 143,605 patients for the hospitalization prediction task and 11,465 patients for the severity prediction task.
arXiv Detail & Related papers (2022-07-25T15:56:14Z) - Literature-Augmented Clinical Outcome Prediction [10.46990394710927]
We introduce techniques to help bridge this gap between EBM and AI-based clinical models.
We propose a novel system that automatically retrieves patient-specific literature based on intensive care (ICU) patient information.
Our model is able to substantially boost predictive accuracy on three challenging tasks in comparison to strong recent baselines.
arXiv Detail & Related papers (2021-11-16T11:19:02Z) - Active learning for medical code assignment [55.99831806138029]
We demonstrate the effectiveness of Active Learning (AL) in multi-label text classification in the clinical domain.
We apply a set of well-known AL methods to help automatically assign ICD-9 codes on the MIMIC-III dataset.
Our results show that the selection of informative instances provides satisfactory classification with a significantly reduced training set.
arXiv Detail & Related papers (2021-04-12T18:11:17Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Ensemble model for pre-discharge icd10 coding prediction [45.82374977939355]
We propose an ensemble model incorporating multiple clinical data sources for accurate code predictions.
We obtain multi-label classification accuracies of 0.73 and 0.58 for average precision, 0.56 and 0.35 for F1-scores and 0.71 and 0.4 accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:02:56Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Multi-label natural language processing to identify diagnosis and
procedure codes from MIMIC-III inpatient notes [0.0]
In the United States, 25% or greater than 200 billion dollars of hospital spending accounts for administrative costs that involve medical coding and billing.
Natural language processing can automate the extraction of codes/labels from unstructured clinical notes.
Our model achieved an overall accuracy of 87.08%, an F1 score of 85.82%, and an AUC of 91.76% for top-10 codes.
arXiv Detail & Related papers (2020-03-17T02:56:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.