Automatic ICD-10 Code Association: A Challenging Task on French Clinical
Texts
- URL: http://arxiv.org/abs/2304.02886v1
- Date: Thu, 6 Apr 2023 06:31:54 GMT
- Title: Automatic ICD-10 Code Association: A Challenging Task on French Clinical
Texts
- Authors: Yakini Tchouka, Jean-Fran\c{c}ois Couchot, David Laiymani, Philippe
Selles, Azzedine Rahmani
- Abstract summary: This paper adapts pre-trained language models to automatically associate the ICD codes.
We propose a model that combines the latest advances in NLP and multi-label classification for ICD-10 code association.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatically associating ICD codes with electronic health data is a
well-known NLP task in medical research. NLP has evolved significantly in
recent years with the emergence of pre-trained language models based on
Transformers architecture, mainly in the English language. This paper adapts
these models to automatically associate the ICD codes. Several neural network
architectures have been experimented with to address the challenges of dealing
with a large set of both input tokens and labels to be guessed. In this paper,
we propose a model that combines the latest advances in NLP and multi-label
classification for ICD-10 code association. Fair experiments on a Clinical
dataset in the French language show that our approach increases the $F_1$-score
metric by more than 55\% compared to state-of-the-art results.
Related papers
- Synthetic Clinical Notes for Rare ICD Codes: A Data-Centric Framework for Long-Tail Medical Coding [26.840057002860235]
Thousands of rare and zero-shot ICD codes are severely underrepresented in datasets like MIMIC-III.<n>We generate 90,000 synthetic notes covering 7,902 ICD codes, significantly expanding the training distribution.<n>Experiments show that our approach modestly improves macro-F1 while maintaining strong micro-F1.
arXiv Detail & Related papers (2025-11-18T03:52:12Z) - Using LLMs for Multilingual Clinical Entity Linking to ICD-10 [3.7463543521744764]
We propose an approach for linking clinical terms to ICD-10 codes in different languages using Large Language Models (LLMs)<n>Our system shows promising results in predicting ICD-10 codes on different benchmark datasets in Spanish.
arXiv Detail & Related papers (2025-09-05T07:30:40Z) - RuCCoD: Towards Automated ICD Coding in Russian [38.98810919082103]
We present a new dataset for ICD coding, annotated with over 10,000 entities and more than 1,500 unique ICD codes.
This dataset serves as a benchmark for several state-of-the-art models, including BERT, LLaMA with LoRA, and RAG.
Our experiments, conducted on a carefully curated test set, demonstrate that training with the automated predicted codes leads to a significant improvement in accuracy compared to manually annotated data from physicians.
arXiv Detail & Related papers (2025-02-28T17:40:24Z) - CoRelation: Boosting Automatic ICD Coding Through Contextualized Code
Relation Learning [56.782963838838036]
We propose a novel approach, a contextualized and flexible framework, to enhance the learning of ICD code representations.
Our approach employs a dependent learning paradigm that considers the context of clinical notes in modeling all possible code relations.
arXiv Detail & Related papers (2024-02-24T03:25:28Z) - Accurate and Well-Calibrated ICD Code Assignment Through Attention Over
Diverse Label Embeddings [1.201425717264024]
Manual assigning ICD codes to clinical text is time-consuming, error-prone, and expensive.
This paper describes a novel approach for automated ICD coding, combining several ideas from previous related work.
Experiments with different splits of the MIMIC-III dataset show that the proposed approach outperforms the current state-of-the-art models in ICD coding.
arXiv Detail & Related papers (2024-02-05T16:40:23Z) - GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models.
We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network.
We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z) - Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt [7.554528566861559]
This study transforms this multi-label classification task into an autoregressive generation task.
Instead of directly predicting the high dimensional space of ICD codes, our model generates the lower dimension of text descriptions.
Experiments on MIMIC-III-few show that our model performs with a marco F1 30.2, which substantially outperforms the previous MIMIC-III-full SOTA model.
arXiv Detail & Related papers (2022-11-24T22:10:50Z) - ICDBigBird: A Contextual Embedding Model for ICD Code Classification [71.58299917476195]
Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks.
ICDBigBird is a BigBird-based model which can integrate a Graph Convolutional Network (GCN)
Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task.
arXiv Detail & Related papers (2022-04-21T20:59:56Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - From Extreme Multi-label to Multi-class: A Hierarchical Approach for
Automated ICD-10 Coding Using Phrase-level Attention [4.387302129801651]
Clinical coding is the task of assigning a set of alphanumeric codes, referred to as ICD (International Classification of Diseases), to a medical event based on the context captured in a clinical narrative.
We propose a novel approach for automatic ICD coding by reformulating the extreme multi-label problem into a simpler multi-class problem using a hierarchical solution.
arXiv Detail & Related papers (2021-02-18T03:19:14Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z) - A Label Attention Model for ICD Coding from Clinical Text [14.910833190248319]
We propose a new label attention model for automatic ICD coding.
It can handle both the various lengths and the interdependence of the ICD code related text fragments.
Our model achieves new state-of-the-art results on three benchmark MIMIC datasets.
arXiv Detail & Related papers (2020-07-13T12:42:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.