CODER: Knowledge infused cross-lingual medical term embedding for term
normalization
- URL: http://arxiv.org/abs/2011.02947v3
- Date: Tue, 18 May 2021 00:46:29 GMT
- Title: CODER: Knowledge infused cross-lingual medical term embedding for term
normalization
- Authors: Zheng Yuan and Zhengyun Zhao and Haixia Sun and Jiao Li and Fei Wang
and Sheng Yu
- Abstract summary: CODER is designed for medical term normalization by providing close vector representations for different terms.
We train CODER via contrastive learning on a medical knowledge graph (KG) named the Unified Medical Language System.
We evaluate CODER in zero-shot term normalization, semantic similarity, and relation classification benchmarks.
- Score: 7.516391006265378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes CODER: contrastive learning on knowledge graphs for
cross-lingual medical term representation. CODER is designed for medical term
normalization by providing close vector representations for different terms
that represent the same or similar medical concepts with cross-lingual support.
We train CODER via contrastive learning on a medical knowledge graph (KG) named
the Unified Medical Language System, where similarities are calculated
utilizing both terms and relation triplets from KG. Training with relations
injects medical knowledge into embeddings and aims to provide potentially
better machine learning features. We evaluate CODER in zero-shot term
normalization, semantic similarity, and relation classification benchmarks,
which show that CODERoutperforms various state-of-the-art biomedical word
embedding, concept embeddings, and contextual embeddings. Our codes and models
are available at https://github.com/GanjinZero/CODER.
Related papers
- Contrastive Learning with Counterfactual Explanations for Radiology Report Generation [83.30609465252441]
We propose a textbfCountertextbfFactual textbfExplanations-based framework (CoFE) for radiology report generation.
Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking what if'' scenarios.
Experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports.
arXiv Detail & Related papers (2024-07-19T17:24:25Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - Knowledge Graph Embeddings for Multi-Lingual Structured Representations
of Radiology Reports [40.606143019674654]
We introduce a novel light-weight graph-based embedding method specifically catering radiology reports.
It takes into account the structure and composition of the report, while also connecting medical terms in the report.
We show the use of this embedding on two tasks namely disease classification of X-ray reports and image classification.
arXiv Detail & Related papers (2023-09-02T11:46:41Z) - KnowAugNet: Multi-Source Medical Knowledge Augmented Medication
Prediction Network with Multi-Level Graph Contrastive Learning [8.71936906687061]
This paper proposes textbfKnowAugNet, a multi-sourced medical knowledge augmented medication prediction network.
It captures the diverse relations between medical codes via multi-level graph contrastive learning framework.
It can assist doctors in making informed medication decisions for patients according to electronic medical records.
arXiv Detail & Related papers (2022-04-25T15:47:41Z) - Automatic Biomedical Term Clustering by Learning Fine-grained Term
Representations [0.8154691566915505]
State-of-the-art term embeddings leverage pretrained language models to encode terms and use synonyms and relation knowledge from knowledge graphs to guide contrastive learning.
These embeddings are not sensitive to minor textual differences which leads to failure for biomedical term clustering.
To alleviate this problem, we adjust the sampling strategy in pretraining term embeddings by providing dynamic hard positive and negative samples.
We name our proposed method as CODER++, and it has been applied in clustering biomedical concepts in the newly released Biomedical Knowledge Graph named BIOS.
arXiv Detail & Related papers (2022-04-01T12:30:58Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - Zero-shot Medical Entity Retrieval without Annotation: Learning From
Rich Knowledge Graph Semantics [5.2710726359379265]
Current approaches tend to work well on specific medical domains but poorly generalize to unseen sub-specialties.
This is of increasing concern under a public health crisis as new medical conditions and drug treatments come to light frequently.
Medical knowledge graphs (KG) contain rich semantics including large numbers of synonyms as well as its curated graphical structures.
arXiv Detail & Related papers (2021-05-26T16:53:48Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Unifying Relational Sentence Generation and Retrieval for Medical Image
Report Composition [142.42920413017163]
Current methods often generate the most common sentences due to dataset bias for individual case.
We propose a novel framework that unifies template retrieval and sentence generation to handle both common and rare abnormality.
arXiv Detail & Related papers (2021-01-09T04:33:27Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z) - Can Embeddings Adequately Represent Medical Terminology? New Large-Scale
Medical Term Similarity Datasets Have the Answer! [13.885093944392464]
A large number of embeddings trained on medical data have emerged, but it remains unclear how well they represent medical terminology.
We present multiple automatically created large-scale medical term similarity datasets.
We evaluate state-of-the-art word and contextual embeddings on our new datasets, comparing multiple vector similarity metrics and word vector aggregation techniques.
arXiv Detail & Related papers (2020-03-24T19:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.