Related papers: Explainable ICD Coding via Entity Linking

Explainable ICD Coding via Entity Linking

URL: http://arxiv.org/abs/2503.20508v2
Date: Mon, 07 Apr 2025 13:54:13 GMT
Title: Explainable ICD Coding via Entity Linking
Authors: Leonor Barreiros, Isabel Coutinho, Gonçalo M. Correia, Bruno Martins,
Abstract summary: Clinical coding is a critical task in healthcare.<n>Traditional methods for automating clinical coding may not provide sufficient explicit evidence for coders in production environments.<n>We propose to reframe the task as an entity linking problem, in which each document is annotated with its set of codes and respective textual evidence.
Score: 2.340984737655165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Clinical coding is a critical task in healthcare, although traditional methods for automating clinical coding may not provide sufficient explicit evidence for coders in production environments. This evidence is crucial, as medical coders have to make sure there exists at least one explicit passage in the input health record that justifies the attribution of a code. We therefore propose to reframe the task as an entity linking problem, in which each document is annotated with its set of codes and respective textual evidence, enabling better human-machine collaboration. By leveraging parameter-efficient fine-tuning of Large Language Models (LLMs), together with constrained decoding, we introduce three approaches to solve this problem that prove effective at disambiguating clinical mentions and that perform well in few-shot scenarios.

Related papers

The Anatomy of Evidence: An Investigation Into Explainable ICD Coding [0.0]
We conduct an in-depth analysis of the MDACE dataset and perform plausibility evaluation of current explainable medical coding systems.<n>Our findings reveal that ground truth evidence aligns with code descriptions to a certain degree.<n>An investigation into state-of-the-art approaches shows a high overlap with ground truth evidence.
arXiv Detail & Related papers (2025-07-02T15:21:29Z)
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding [49.56049319037421]
KodCode is a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data.<n>It comprises question-solution-test triplets that are systematically validated via a self-verification procedure.<n>This pipeline yields a large-scale, robust and diverse coding dataset.
arXiv Detail & Related papers (2025-03-04T19:17:36Z)
MKE-Coder: Multi-Axial Knowledge with Evidence Verification in ICD Coding for Chinese EMRs [9.615982382826768]
This paper introduces a novel framework called MKE-Coder: Multi-axial Knowledge with Evidence verification in ICD coding for Chinese EMRs.<n>We identify candidate codes for the diagnosis and categorize each of them into knowledge under four coding axes.<n>We retrieve corresponding clinical evidence from the comprehensive content of EMRs and filter credible evidence through a scoring model.
arXiv Detail & Related papers (2025-02-19T08:08:53Z)
Aligning AI Research with the Needs of Clinical Coding Workflows: Eight Recommendations Based on US Data Analysis and Critical Review [14.381199039813675]
This position paper aims to align AI coding research more closely with practical challenges of clinical coding. Based on our analysis, we offer eight specific recommendations, suggesting ways to improve current evaluation methods.
arXiv Detail & Related papers (2024-12-23T23:39:05Z)
Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking [58.25862290294702]
We present MedChain, a dataset of 12,163 clinical cases that covers five key stages of clinical workflow.<n>We also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MCase-RAG module to learn from previous cases and adapt its responses.
arXiv Detail & Related papers (2024-12-02T15:25:02Z)
Auxiliary Knowledge-Induced Learning for Automatic Multi-Label Medical Document Classification [22.323705343864336]
We propose a novel approach for ICD indexing that adopts three ideas. We use a multi-level deep dilated residual convolution encoder to aggregate the information from the clinical notes. We formalize the task of ICD classification with auxiliary knowledge of the medical records.
arXiv Detail & Related papers (2024-05-29T13:44:07Z)
CoRelation: Boosting Automatic ICD Coding Through Contextualized Code Relation Learning [56.782963838838036]
We propose a novel approach, a contextualized and flexible framework, to enhance the learning of ICD code representations. Our approach employs a dependent learning paradigm that considers the context of clinical notes in modeling all possible code relations.
arXiv Detail & Related papers (2024-02-24T03:25:28Z)
Automated clinical coding using off-the-shelf large language models [10.365958121087305]
The task of assigning diagnostic ICD codes to patient hospital admissions is typically performed by expert human coders. Efforts towards automated ICD coding are dominated by supervised deep learning models. In this work, we leverage off-the-shelf pre-trained generative large language models to develop a practical solution.
arXiv Detail & Related papers (2023-10-10T11:56:48Z)
ICDBigBird: A Contextual Embedding Model for ICD Code Classification [71.58299917476195]
Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks. ICDBigBird is a BigBird-based model which can integrate a Graph Convolutional Network (GCN) Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task.
arXiv Detail & Related papers (2022-04-21T20:59:56Z)
A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding. These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information. Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z)
Medical Code Assignment with Gated Convolution and Note-Code Interaction [39.079615516043674]
We propose a novel method, gated convolutional neural networks, and a note-code interaction (GatedCNN-NCI) for automatic medical code assignment. With a novel note-code interaction design and a graph message passing mechanism, we explicitly capture the underlying dependency between notes and codes. Our proposed model outperforms state-of-the-art models in most cases, and our model size is on par with light-weighted baselines.
arXiv Detail & Related papers (2020-10-14T11:37:24Z)
Learning Contextualized Document Representations for Healthcare Answer Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents. Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse. We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.