MDACE: MIMIC Documents Annotated with Code Evidence
- URL: http://arxiv.org/abs/2307.03859v1
- Date: Fri, 7 Jul 2023 22:45:59 GMT
- Title: MDACE: MIMIC Documents Annotated with Code Evidence
- Authors: Hua Cheng, Rana Jafari, April Russell, Russell Klopfer, Edmond Lu,
Benjamin Striner, Matthew R. Gormley
- Abstract summary: We introduce a dataset for evidence/rationale extraction on an extreme multi-label classification task over long medical documents.
The dataset -- annotated by professional medical coders -- consists of 302 Inpatient charts with 3,934 evidence spans and 52 Profee charts with 5,563 evidence spans.
- Score: 7.200839302089557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a dataset for evidence/rationale extraction on an extreme
multi-label classification task over long medical documents. One such task is
Computer-Assisted Coding (CAC) which has improved significantly in recent
years, thanks to advances in machine learning technologies. Yet simply
predicting a set of final codes for a patient encounter is insufficient as CAC
systems are required to provide supporting textual evidence to justify the
billing codes. A model able to produce accurate and reliable supporting
evidence for each code would be a tremendous benefit. However, a human
annotated code evidence corpus is extremely difficult to create because it
requires specialized knowledge. In this paper, we introduce MDACE, the first
publicly available code evidence dataset, which is built on a subset of the
MIMIC-III clinical records. The dataset -- annotated by professional medical
coders -- consists of 302 Inpatient charts with 3,934 evidence spans and 52
Profee charts with 5,563 evidence spans. We implemented several evidence
extraction methods based on the EffectiveCAN model (Liu et al., 2021) to
establish baseline performance on this dataset. MDACE can be used to evaluate
code evidence extraction methods for CAC systems, as well as the accuracy and
interpretability of deep learning models for multi-label classification. We
believe that the release of MDACE will greatly improve the understanding and
application of deep learning technologies for medical coding and document
classification.
Related papers
- The Anatomy of Evidence: An Investigation Into Explainable ICD Coding [0.0]
We conduct an in-depth analysis of the MDACE dataset and perform plausibility evaluation of current explainable medical coding systems.<n>Our findings reveal that ground truth evidence aligns with code descriptions to a certain degree.<n>An investigation into state-of-the-art approaches shows a high overlap with ground truth evidence.
arXiv Detail & Related papers (2025-07-02T15:21:29Z) - MedCodER: A Generative AI Assistant for Medical Coding [3.7153274758003967]
We introduce MedCodER, a Generative AI framework for automatic medical coding.
MedCodER achieves a micro-F1 score of 0.60 on International Classification of Diseases (ICD) code prediction.
We present a new dataset containing medical records annotated with disease diagnoses, ICD codes, and supporting evidence texts.
arXiv Detail & Related papers (2024-09-18T19:36:33Z) - A Two-Stage Decoder for Efficient ICD Coding [10.634394331433322]
We propose a two-stage decoding mechanism to predict ICD codes.
At first, we predict the parent code and then predict the child code based on the previous prediction.
Experiments on the public MIMIC-III data set show that our model performs well in single-model settings.
arXiv Detail & Related papers (2023-05-27T17:25:13Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - Can Current Explainability Help Provide References in Clinical Notes to
Support Humans Annotate Medical Codes? [53.45585591262433]
We present an explainable Read, Attend, and Code (xRAC) framework and assess two approaches, attention score-based xRAC-ATTN and model-agnostic knowledge-distillation-based xRAC-KD.
We find that the supporting evidence text highlighted by xRAC-ATTN is of higher quality than xRAC-KD whereas xRAC-KD has potential advantages in production deployment scenarios.
arXiv Detail & Related papers (2022-10-28T04:06:07Z) - A Systematic Literature Review of Automated ICD Coding and
Classification Systems using Discharge Summaries [5.156484100374058]
Codification of free-text clinical narratives has long been recognised to be beneficial for secondary uses such as funding, insurance claim processing and research.
The current scenario of assigning codes is a manual process which is very expensive, time-consuming and error prone.
This systematic literature review provides a comprehensive overview of automated clinical coding systems.
arXiv Detail & Related papers (2021-07-12T03:55:17Z) - Read, Attend, and Code: Pushing the Limits of Medical Codes Prediction
from Clinical Notes by Machines [0.42641920138420947]
We present our Read, Attend, and Code (RAC) model for learning the medical code assignment mappings.
RAC establishes a new state of the art (SOTA) considerably outperforming the current best Macro-F1 by 18.7%.
This new milestone marks a meaningful step toward fully autonomous medical coding (AMC) in machines.
arXiv Detail & Related papers (2021-07-10T06:01:58Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - An Explainable CNN Approach for Medical Codes Prediction from Clinical
Text [1.7746314978241657]
We develop CNN-based methods for automatic ICD coding based on clinical text from intensive care unit (ICU) stays.
We come up with the Shallow and Wide Attention convolutional Mechanism (SWAM), which allows our model to learn local and low-level features for each label.
arXiv Detail & Related papers (2021-01-14T02:05:34Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.