Multi-label natural language processing to identify diagnosis and
procedure codes from MIMIC-III inpatient notes
- URL: http://arxiv.org/abs/2003.07507v1
- Date: Tue, 17 Mar 2020 02:56:27 GMT
- Title: Multi-label natural language processing to identify diagnosis and
procedure codes from MIMIC-III inpatient notes
- Authors: A.K. Bhavani Singh, Mounika Guntu, Ananth Reddy Bhimireddy, Judy W.
Gichoya, Saptarshi Purkayastha
- Abstract summary: In the United States, 25% or greater than 200 billion dollars of hospital spending accounts for administrative costs that involve medical coding and billing.
Natural language processing can automate the extraction of codes/labels from unstructured clinical notes.
Our model achieved an overall accuracy of 87.08%, an F1 score of 85.82%, and an AUC of 91.76% for top-10 codes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the United States, 25% or greater than 200 billion dollars of hospital
spending accounts for administrative costs that involve services for medical
coding and billing. With the increasing number of patient records, manual
assignment of the codes performed is overwhelming, time-consuming and
error-prone, causing billing errors. Natural language processing can automate
the extraction of codes/labels from unstructured clinical notes, which can aid
human coders to save time, increase productivity, and verify medical coding
errors. Our objective is to identify appropriate diagnosis and procedure codes
from clinical notes by performing multi-label classification. We used
de-identified data of critical care patients from the MIMIC-III database and
subset the data to select the ten (top-10) and fifty (top-50) most common
diagnoses and procedures, which covers 47.45% and 74.12% of all admissions
respectively. We implemented state-of-the-art Bidirectional Encoder
Representations from Transformers (BERT) to fine-tune the language model on 80%
of the data and validated on the remaining 20%. The model achieved an overall
accuracy of 87.08%, an F1 score of 85.82%, and an AUC of 91.76% for top-10
codes. For the top-50 codes, our model achieved an overall accuracy of 93.76%,
an F1 score of 92.24%, and AUC of 91%. When compared to previously published
research, our model outperforms in predicting codes from the clinical text. We
discuss approaches to generalize the knowledge discovery process of our
MIMIC-BERT to other clinical notes. This can help human coders to save time,
prevent backlogs, and additional costs due to coding errors.
Related papers
- Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding [44.01429184037945]
We introduce ALIGN, a novel compositional LLM-based system for automated, zero-shot medical coding.
We evaluate ALIGN on harmonizing medication terms into Anatomical Therapeutic Chemical (ATC) and medical history terms into Medical Dictionary for Regulatory Activities (MedDRA) codes.
arXiv Detail & Related papers (2024-11-20T09:59:12Z) - MedAlign: A Clinician-Generated Dataset for Instruction Following with
Electronic Medical Records [60.35217378132709]
Large language models (LLMs) can follow natural language instructions with human-level fluency.
evaluating LLMs on realistic text generation tasks for healthcare remains challenging.
We introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data.
arXiv Detail & Related papers (2023-08-27T12:24:39Z) - Autoencoder-based prediction of ICU clinical codes [0.0]
We use the MIMIC-III dataset to frame the task of completing the clinical codes as a recommendation problem.
We con-sider various autoencoder approaches plus two strong baselines; item co-occurrence and Singular Value Decomposition.
Using clinical variables in addition to the incomplete codes list, improves the predictive performance of the models.
arXiv Detail & Related papers (2023-05-08T18:56:37Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - Active learning for medical code assignment [55.99831806138029]
We demonstrate the effectiveness of Active Learning (AL) in multi-label text classification in the clinical domain.
We apply a set of well-known AL methods to help automatically assign ICD-9 codes on the MIMIC-III dataset.
Our results show that the selection of informative instances provides satisfactory classification with a significantly reduced training set.
arXiv Detail & Related papers (2021-04-12T18:11:17Z) - TransICD: Transformer Based Code-wise Attention Model for Explainable
ICD Coding [5.273190477622007]
International Classification of Disease (ICD) coding procedure has been shown to be effective and crucial to the billing system in medical sector.
Currently, ICD codes are assigned to a clinical note manually which is likely to cause many errors.
In this project, we apply a transformer-based architecture to capture the interdependence among the tokens of a document and then use a code-wise attention mechanism to learn code-specific representations of the entire document.
arXiv Detail & Related papers (2021-03-28T05:34:32Z) - Collaborative residual learners for automatic icd10 prediction using
prescribed medications [45.82374977939355]
We propose a novel collaborative residual learning based model to automatically predict ICD10 codes employing only prescriptions data.
We obtain multi-label classification accuracy of 0.71 and 0.57 of average precision, 0.57 and 0.38 of F1-score and 0.73 and 0.44 of accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:07:27Z) - Ensemble model for pre-discharge icd10 coding prediction [45.82374977939355]
We propose an ensemble model incorporating multiple clinical data sources for accurate code predictions.
We obtain multi-label classification accuracies of 0.73 and 0.58 for average precision, 0.56 and 0.35 for F1-scores and 0.71 and 0.4 accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:02:56Z) - Natural language processing of MIMIC-III clinical notes for identifying
diagnosis and procedures with neural networks [0.0]
We report the performance of a natural language processing model that can map clinical notes to medical codes.
We employed state-of-the-art deep learning method, ULMFiT on the largest emergency department clinical notes dataset MIMIC III.
Our models were able to predict the top-10 diagnoses and procedures with 80.3% and 80.5% accuracy, whereas the top-50 ICD-9 codes of diagnosis and procedures are predicted with 70.7% and 63.9% accuracy.
arXiv Detail & Related papers (2019-12-28T04:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.