Read, Attend, and Code: Pushing the Limits of Medical Codes Prediction
from Clinical Notes by Machines
- URL: http://arxiv.org/abs/2107.10650v1
- Date: Sat, 10 Jul 2021 06:01:58 GMT
- Title: Read, Attend, and Code: Pushing the Limits of Medical Codes Prediction
from Clinical Notes by Machines
- Authors: Byung-Hak Kim and Varun Ganapathi
- Abstract summary: We present our Read, Attend, and Code (RAC) model for learning the medical code assignment mappings.
RAC establishes a new state of the art (SOTA) considerably outperforming the current best Macro-F1 by 18.7%.
This new milestone marks a meaningful step toward fully autonomous medical coding (AMC) in machines.
- Score: 0.42641920138420947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prediction of medical codes from clinical notes is both a practical and
essential need for every healthcare delivery organization within current
medical systems. Automating annotation will save significant time and excessive
effort spent by human coders today. However, the biggest challenge is directly
identifying appropriate medical codes out of several thousands of
high-dimensional codes from unstructured free-text clinical notes. In the past
three years, with Convolutional Neural Networks (CNN) and Long Short-Term
Memory (LTSM) networks, there have been vast improvements in tackling the most
challenging benchmark of the MIMIC-III-full-label inpatient clinical notes
dataset. This progress raises the fundamental question of how far automated
machine learning (ML) systems are from human coders' working performance. We
assessed the baseline of human coders' performance on the same subsampled
testing set. We also present our Read, Attend, and Code (RAC) model for
learning the medical code assignment mappings. By connecting convolved
embeddings with self-attention and code-title guided attention modules,
combined with sentence permutation-based data augmentations and stochastic
weight averaging training, RAC establishes a new state of the art (SOTA),
considerably outperforming the current best Macro-F1 by 18.7%, and reaches past
the human-level coding baseline. This new milestone marks a meaningful step
toward fully autonomous medical coding (AMC) in machines reaching parity with
human coders' performance in medical code prediction.
Related papers
- MedCodER: A Generative AI Assistant for Medical Coding [3.7153274758003967]
We introduce MedCodER, a Generative AI framework for automatic medical coding.
MedCodER achieves a micro-F1 score of 0.60 on International Classification of Diseases (ICD) code prediction.
We present a new dataset containing medical records annotated with disease diagnoses, ICD codes, and supporting evidence texts.
arXiv Detail & Related papers (2024-09-18T19:36:33Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - Medical Codes Prediction from Clinical Notes: From Human Coders to
Machines [0.21320960069210473]
Prediction of medical codes from clinical notes is a practical and essential need for every healthcare delivery organization.
The biggest challenge is directly identifying appropriate medical codes from several thousands of high-dimensional codes from unstructured free-text clinical notes.
Recent studies have shown the state-of-the-art code prediction results of full-fledged deep learning-based methods.
arXiv Detail & Related papers (2022-10-30T14:24:13Z) - Can Current Explainability Help Provide References in Clinical Notes to
Support Humans Annotate Medical Codes? [53.45585591262433]
We present an explainable Read, Attend, and Code (xRAC) framework and assess two approaches, attention score-based xRAC-ATTN and model-agnostic knowledge-distillation-based xRAC-KD.
We find that the supporting evidence text highlighted by xRAC-ATTN is of higher quality than xRAC-KD whereas xRAC-KD has potential advantages in production deployment scenarios.
arXiv Detail & Related papers (2022-10-28T04:06:07Z) - The Medkit-Learn(ing) Environment: Medical Decision Modelling through
Simulation [81.72197368690031]
We present a new benchmarking suite designed specifically for medical sequential decision making.
The Medkit-Learn(ing) Environment is a publicly available Python package providing simple and easy access to high-fidelity synthetic medical data.
arXiv Detail & Related papers (2021-06-08T10:38:09Z) - Multitask Recalibrated Aggregation Network for Medical Code Prediction [19.330911490203317]
We propose a multitask recalibrated aggregation network to solve the challenges of encoding lengthy and noisy clinical documents.
In particular, multitask learning shares information across different coding schemes and captures the dependencies between different medical codes.
Experiments with a real-world MIMIC-III dataset show significantly improved predictive performance.
arXiv Detail & Related papers (2021-04-02T09:22:10Z) - TransICD: Transformer Based Code-wise Attention Model for Explainable
ICD Coding [5.273190477622007]
International Classification of Disease (ICD) coding procedure has been shown to be effective and crucial to the billing system in medical sector.
Currently, ICD codes are assigned to a clinical note manually which is likely to cause many errors.
In this project, we apply a transformer-based architecture to capture the interdependence among the tokens of a document and then use a code-wise attention mechanism to learn code-specific representations of the entire document.
arXiv Detail & Related papers (2021-03-28T05:34:32Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Medical Code Assignment with Gated Convolution and Note-Code Interaction [39.079615516043674]
We propose a novel method, gated convolutional neural networks, and a note-code interaction (GatedCNN-NCI) for automatic medical code assignment.
With a novel note-code interaction design and a graph message passing mechanism, we explicitly capture the underlying dependency between notes and codes.
Our proposed model outperforms state-of-the-art models in most cases, and our model size is on par with light-weighted baselines.
arXiv Detail & Related papers (2020-10-14T11:37:24Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.