Knowledge-Base Enriched Word Embeddings for Biomedical Domain
- URL: http://arxiv.org/abs/2103.00479v1
- Date: Sat, 20 Feb 2021 18:18:51 GMT
- Title: Knowledge-Base Enriched Word Embeddings for Biomedical Domain
- Authors: Kishlay Jha
- Abstract summary: We propose a new word embedding based model for biomedical domain that jointly leverages the information from available corpora and domain knowledge.
Unlike existing approaches, the proposed methodology is simple but adept at capturing the precise knowledge available in domain resources in an accurate way.
- Score: 5.086571902225929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Word embeddings have been shown adept at capturing the semantic and syntactic
regularities of the natural language text, as a result of which these
representations have found their utility in a wide variety of downstream
content analysis tasks. Commonly, these word embedding techniques derive the
distributed representation of words based on the local context information.
However, such approaches ignore the rich amount of explicit information present
in knowledge-bases. This is problematic, as it might lead to poor
representation for words with insufficient local context such as domain
specific words. Furthermore, the problem becomes pronounced in domain such as
bio-medicine where the presence of these domain specific words are relatively
high. Towards this end, in this project, we propose a new word embedding based
model for biomedical domain that jointly leverages the information from
available corpora and domain knowledge in order to generate knowledge-base
powered embeddings. Unlike existing approaches, the proposed methodology is
simple but adept at capturing the precise knowledge available in domain
resources in an accurate way. Experimental results on biomedical concept
similarity and relatedness task validates the effectiveness of the proposed
approach.
Related papers
- DARE: Towards Robust Text Explanations in Biomedical and Healthcare
Applications [54.93807822347193]
We show how to adapt attribution robustness estimation methods to a given domain, so as to take into account domain-specific plausibility.
Next, we provide two methods, adversarial training and FAR training, to mitigate the brittleness characterized by DARE.
Finally, we empirically validate our methods with extensive experiments on three established biomedical benchmarks.
arXiv Detail & Related papers (2023-07-05T08:11:40Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization
for Relation Extraction [111.74812895391672]
We propose a Knowledge-aware Prompt-tuning approach with synergistic optimization (KnowPrompt)
We inject latent knowledge contained in relation labels into prompt construction with learnable virtual type words and answer words.
arXiv Detail & Related papers (2021-04-15T17:57:43Z) - Integration of Domain Knowledge using Medical Knowledge Graph Deep
Learning for Cancer Phenotyping [6.077023952306772]
We propose a method to integrate external knowledge from medical terminology into the context captured by word embeddings.
We evaluate the proposed approach using a Multitask Convolutional Neural Network (MT-CNN) to extract six cancer characteristics from a dataset of 900K cancer pathology reports.
arXiv Detail & Related papers (2021-01-05T03:59:43Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z) - On the Effects of Knowledge-Augmented Data in Word Embeddings [0.6749750044497732]
We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings.
We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.
arXiv Detail & Related papers (2020-10-05T02:14:13Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - On the Combined Use of Extrinsic Semantic Resources for Medical
Information Search [0.0]
We develop a framework to highlight and expand head medical concepts in verbose medical queries.
We also build semantically enhanced inverted index documents.
To demonstrate the effectiveness of the proposed approach, we conducted several experiments over the CLEF 2014 dataset.
arXiv Detail & Related papers (2020-05-17T14:18:04Z) - Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain [1.3526604206343171]
Interpretability is a key means to justification which is an integral part when it comes to biomedical applications.
We present an inclusive study on interpretability of word embeddings in the medical domain, focusing on the role of sparse methods.
Based on our experiments, it is seen that sparse word vectors show far more interpretability while preserving the performance of their original vectors in downstream tasks.
arXiv Detail & Related papers (2020-05-11T13:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.