Towards Semi-Structured Automatic ICD Coding via Tree-based Contrastive
Learning
- URL: http://arxiv.org/abs/2310.09672v1
- Date: Sat, 14 Oct 2023 22:07:13 GMT
- Title: Towards Semi-Structured Automatic ICD Coding via Tree-based Contrastive
Learning
- Authors: Chang Lu, Chandan K. Reddy, Ping Wang, Yue Ning
- Abstract summary: We investigate the semi-structured nature of clinical notes and propose an automatic algorithm to segment them into sections.
To address the variability issues in existing ICD coding models with limited data, we introduce a contrastive pre-training approach on sections.
- Score: 18.380293890624102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic coding of International Classification of Diseases (ICD) is a
multi-label text categorization task that involves extracting disease or
procedure codes from clinical notes. Despite the application of
state-of-the-art natural language processing (NLP) techniques, there are still
challenges including limited availability of data due to privacy constraints
and the high variability of clinical notes caused by different writing habits
of medical professionals and various pathological features of patients. In this
work, we investigate the semi-structured nature of clinical notes and propose
an automatic algorithm to segment them into sections. To address the
variability issues in existing ICD coding models with limited data, we
introduce a contrastive pre-training approach on sections using a soft
multi-label similarity metric based on tree edit distance. Additionally, we
design a masked section training strategy to enable ICD coding models to locate
sections related to ICD codes. Extensive experimental results demonstrate that
our proposed training strategies effectively enhance the performance of
existing ICD coding methods.
Related papers
- Auxiliary Knowledge-Induced Learning for Automatic Multi-Label Medical Document Classification [22.323705343864336]
We propose a novel approach for ICD indexing that adopts three ideas.
We use a multi-level deep dilated residual convolution encoder to aggregate the information from the clinical notes.
We formalize the task of ICD classification with auxiliary knowledge of the medical records.
arXiv Detail & Related papers (2024-05-29T13:44:07Z) - CoRelation: Boosting Automatic ICD Coding Through Contextualized Code
Relation Learning [56.782963838838036]
We propose a novel approach, a contextualized and flexible framework, to enhance the learning of ICD code representations.
Our approach employs a dependent learning paradigm that considers the context of clinical notes in modeling all possible code relations.
arXiv Detail & Related papers (2024-02-24T03:25:28Z) - Accurate and Well-Calibrated ICD Code Assignment Through Attention Over
Diverse Label Embeddings [1.201425717264024]
Manual assigning ICD codes to clinical text is time-consuming, error-prone, and expensive.
This paper describes a novel approach for automated ICD coding, combining several ideas from previous related work.
Experiments with different splits of the MIMIC-III dataset show that the proposed approach outperforms the current state-of-the-art models in ICD coding.
arXiv Detail & Related papers (2024-02-05T16:40:23Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - ICDBigBird: A Contextual Embedding Model for ICD Code Classification [71.58299917476195]
Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks.
ICDBigBird is a BigBird-based model which can integrate a Graph Convolutional Network (GCN)
Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task.
arXiv Detail & Related papers (2022-04-21T20:59:56Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z) - A Label Attention Model for ICD Coding from Clinical Text [14.910833190248319]
We propose a new label attention model for automatic ICD coding.
It can handle both the various lengths and the interdependence of the ICD code related text fragments.
Our model achieves new state-of-the-art results on three benchmark MIMIC datasets.
arXiv Detail & Related papers (2020-07-13T12:42:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.