Related papers: Knowledge-Base Enriched Word Embeddings for Biomedical Domain

Knowledge-Base Enriched Word Embeddings for Biomedical Domain

URL: http://arxiv.org/abs/2103.00479v1
Date: Sat, 20 Feb 2021 18:18:51 GMT
Title: Knowledge-Base Enriched Word Embeddings for Biomedical Domain
Authors: Kishlay Jha
Abstract summary: We propose a new word embedding based model for biomedical domain that jointly leverages the information from available corpora and domain knowledge. Unlike existing approaches, the proposed methodology is simple but adept at capturing the precise knowledge available in domain resources in an accurate way.
Score: 5.086571902225929
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Word embeddings have been shown adept at capturing the semantic and syntactic regularities of the natural language text, as a result of which these representations have found their utility in a wide variety of downstream content analysis tasks. Commonly, these word embedding techniques derive the distributed representation of words based on the local context information. However, such approaches ignore the rich amount of explicit information present in knowledge-bases. This is problematic, as it might lead to poor representation for words with insufficient local context such as domain specific words. Furthermore, the problem becomes pronounced in domain such as bio-medicine where the presence of these domain specific words are relatively high. Towards this end, in this project, we propose a new word embedding based model for biomedical domain that jointly leverages the information from available corpora and domain knowledge in order to generate knowledge-base powered embeddings. Unlike existing approaches, the proposed methodology is simple but adept at capturing the precise knowledge available in domain resources in an accurate way. Experimental results on biomedical concept similarity and relatedness task validates the effectiveness of the proposed approach.

Related papers

Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data [9.531822246256928]
The root cause of the problem is that the context-based BERT embedding of the keywords may not be discriminative enough to produce discriminative text representation for classification.<n>Motivated by this finding, we develop a method to enhance word embeddings using domain-specific lexical knowledge.<n>The knowledge-based embedding enhancement model projects the BERT embedding into a new space where within-class similarity and between-class difference are maximized.
arXiv Detail & Related papers (2025-06-02T12:59:41Z)
Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
DARE: Towards Robust Text Explanations in Biomedical and Healthcare Applications [54.93807822347193]
We show how to adapt attribution robustness estimation methods to a given domain, so as to take into account domain-specific plausibility. Next, we provide two methods, adversarial training and FAR training, to mitigate the brittleness characterized by DARE. Finally, we empirically validate our methods with extensive experiments on three established biomedical benchmarks.
arXiv Detail & Related papers (2023-07-05T08:11:40Z)
EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations. Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z)
Clinical Named Entity Recognition using Contextualized Token Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context. We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair) Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z)
KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction [111.74812895391672]
We propose a Knowledge-aware Prompt-tuning approach with synergistic optimization (KnowPrompt) We inject latent knowledge contained in relation labels into prompt construction with learnable virtual type words and answer words.
arXiv Detail & Related papers (2021-04-15T17:57:43Z)
Integration of Domain Knowledge using Medical Knowledge Graph Deep Learning for Cancer Phenotyping [6.077023952306772]
We propose a method to integrate external knowledge from medical terminology into the context captured by word embeddings. We evaluate the proposed approach using a Multitask Convolutional Neural Network (MT-CNN) to extract six cancer characteristics from a dataset of 900K cancer pathology reports.
arXiv Detail & Related papers (2021-01-05T03:59:43Z)
UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process. By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z)
On the Effects of Knowledge-Augmented Data in Word Embeddings [0.6749750044497732]
We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings. We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.
arXiv Detail & Related papers (2020-10-05T02:14:13Z)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction. We show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
On the Combined Use of Extrinsic Semantic Resources for Medical Information Search [0.0]
We develop a framework to highlight and expand head medical concepts in verbose medical queries. We also build semantically enhanced inverted index documents. To demonstrate the effectiveness of the proposed approach, we conducted several experiments over the CLEF 2014 dataset.
arXiv Detail & Related papers (2020-05-17T14:18:04Z)
Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain [1.3526604206343171]
Interpretability is a key means to justification which is an integral part when it comes to biomedical applications. We present an inclusive study on interpretability of word embeddings in the medical domain, focusing on the role of sparse methods. Based on our experiments, it is seen that sparse word vectors show far more interpretability while preserving the performance of their original vectors in downstream tasks.
arXiv Detail & Related papers (2020-05-11T13:56:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.