Related papers: CODER: Knowledge infused cross-lingual medical term embedding for term normalization

CODER: Knowledge infused cross-lingual medical term embedding for term normalization

URL: http://arxiv.org/abs/2011.02947v3
Date: Tue, 18 May 2021 00:46:29 GMT
Title: CODER: Knowledge infused cross-lingual medical term embedding for term normalization
Authors: Zheng Yuan and Zhengyun Zhao and Haixia Sun and Jiao Li and Fei Wang and Sheng Yu
Abstract summary: CODER is designed for medical term normalization by providing close vector representations for different terms. We train CODER via contrastive learning on a medical knowledge graph (KG) named the Unified Medical Language System. We evaluate CODER in zero-shot term normalization, semantic similarity, and relation classification benchmarks.
Score: 7.516391006265378
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper proposes CODER: contrastive learning on knowledge graphs for cross-lingual medical term representation. CODER is designed for medical term normalization by providing close vector representations for different terms that represent the same or similar medical concepts with cross-lingual support. We train CODER via contrastive learning on a medical knowledge graph (KG) named the Unified Medical Language System, where similarities are calculated utilizing both terms and relation triplets from KG. Training with relations injects medical knowledge into embeddings and aims to provide potentially better machine learning features. We evaluate CODER in zero-shot term normalization, semantic similarity, and relation classification benchmarks, which show that CODERoutperforms various state-of-the-art biomedical word embedding, concept embeddings, and contextual embeddings. Our codes and models are available at https://github.com/GanjinZero/CODER.

Related papers

Multimodal Medical Code Tokenizer [15.816571598837823]
Existing tokenizers treat medical codes from EHRs as isolated textual tokens. Medical vocabularies contain more than 600,000 codes with critical information for clinical reasoning. We introduce MedTok, a multimodal medical code tokenizer that uses the text descriptions and relational context of codes.
arXiv Detail & Related papers (2025-02-06T06:58:09Z)
Contrastive Learning with Counterfactual Explanations for Radiology Report Generation [83.30609465252441]
We propose a textbfCountertextbfFactual textbfExplanations-based framework (CoFE) for radiology report generation. Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking what if'' scenarios. Experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports.
arXiv Detail & Related papers (2024-07-19T17:24:25Z)
Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z)
Knowledge Graph Embeddings for Multi-Lingual Structured Representations of Radiology Reports [40.606143019674654]
We introduce a novel light-weight graph-based embedding method specifically catering radiology reports. It takes into account the structure and composition of the report, while also connecting medical terms in the report. We show the use of this embedding on two tasks namely disease classification of X-ray reports and image classification.
arXiv Detail & Related papers (2023-09-02T11:46:41Z)
KnowAugNet: Multi-Source Medical Knowledge Augmented Medication Prediction Network with Multi-Level Graph Contrastive Learning [8.71936906687061]
This paper proposes textbfKnowAugNet, a multi-sourced medical knowledge augmented medication prediction network. It captures the diverse relations between medical codes via multi-level graph contrastive learning framework. It can assist doctors in making informed medication decisions for patients according to electronic medical records.
arXiv Detail & Related papers (2022-04-25T15:47:41Z)
Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations [0.8154691566915505]
State-of-the-art term embeddings leverage pretrained language models to encode terms and use synonyms and relation knowledge from knowledge graphs to guide contrastive learning. These embeddings are not sensitive to minor textual differences which leads to failure for biomedical term clustering. To alleviate this problem, we adjust the sampling strategy in pretraining term embeddings by providing dynamic hard positive and negative samples. We name our proposed method as CODER++, and it has been applied in clustering biomedical concepts in the newly released Biomedical Knowledge Graph named BIOS.
arXiv Detail & Related papers (2022-04-01T12:30:58Z)
Statistical Dependency Guided Contrastive Learning for Multiple Labeling in Prenatal Ultrasound [56.631021151764955]
Standard plane recognition plays an important role in prenatal ultrasound (US) screening. We build a novel multi-label learning scheme to identify multiple standard planes and corresponding anatomical structures simultaneously.
arXiv Detail & Related papers (2021-08-11T06:39:26Z)
Clinical Named Entity Recognition using Contextualized Token Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context. We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair) Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z)
Zero-shot Medical Entity Retrieval without Annotation: Learning From Rich Knowledge Graph Semantics [5.2710726359379265]
Current approaches tend to work well on specific medical domains but poorly generalize to unseen sub-specialties. This is of increasing concern under a public health crisis as new medical conditions and drug treatments come to light frequently. Medical knowledge graphs (KG) contain rich semantics including large numbers of synonyms as well as its curated graphical structures.
arXiv Detail & Related papers (2021-05-26T16:53:48Z)
A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding. These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information. Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z)
Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition [142.42920413017163]
Current methods often generate the most common sentences due to dataset bias for individual case. We propose a novel framework that unifies template retrieval and sentence generation to handle both common and rare abnormality.
arXiv Detail & Related papers (2021-01-09T04:33:27Z)
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns. ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer! [13.885093944392464]
A large number of embeddings trained on medical data have emerged, but it remains unclear how well they represent medical terminology. We present multiple automatically created large-scale medical term similarity datasets. We evaluate state-of-the-art word and contextual embeddings on our new datasets, comparing multiple vector similarity metrics and word vector aggregation techniques.
arXiv Detail & Related papers (2020-03-24T19:18:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.