Leveraging knowledge graphs to update scientific word embeddings using
latent semantic imputation
- URL: http://arxiv.org/abs/2210.15358v1
- Date: Thu, 27 Oct 2022 12:15:26 GMT
- Title: Leveraging knowledge graphs to update scientific word embeddings using
latent semantic imputation
- Authors: Jason Hoelscher-Obermaier, Edward Stevinson, Valentin Stauber, Ivaylo
Zhelev, Victor Botev, Ronin Wu, Jeremy Minton
- Abstract summary: We show how glslsi can impute embeddings for domain-specific words from up-to-date knowledge graphs.
We show that LSI can produce reliable embedding vectors for rare and OOV terms in the biomedical domain.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The most interesting words in scientific texts will often be novel or rare.
This presents a challenge for scientific word embedding models to determine
quality embedding vectors for useful terms that are infrequent or newly
emerging. We demonstrate how \gls{lsi} can address this problem by imputing
embeddings for domain-specific words from up-to-date knowledge graphs while
otherwise preserving the original word embedding model. We use the MeSH
knowledge graph to impute embedding vectors for biomedical terminology without
retraining and evaluate the resulting embedding model on a domain-specific
word-pair similarity task. We show that LSI can produce reliable embedding
vectors for rare and OOV terms in the biomedical domain.
Related papers
- Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - Automatic Biomedical Term Clustering by Learning Fine-grained Term
Representations [0.8154691566915505]
State-of-the-art term embeddings leverage pretrained language models to encode terms and use synonyms and relation knowledge from knowledge graphs to guide contrastive learning.
These embeddings are not sensitive to minor textual differences which leads to failure for biomedical term clustering.
To alleviate this problem, we adjust the sampling strategy in pretraining term embeddings by providing dynamic hard positive and negative samples.
We name our proposed method as CODER++, and it has been applied in clustering biomedical concepts in the newly released Biomedical Knowledge Graph named BIOS.
arXiv Detail & Related papers (2022-04-01T12:30:58Z) - Taxonomy Enrichment with Text and Graph Vector Representations [61.814256012166794]
We address the problem of taxonomy enrichment which aims at adding new words to the existing taxonomy.
We present a new method that allows achieving high results on this task with little effort.
We achieve state-of-the-art results across different datasets and provide an in-depth error analysis of mistakes.
arXiv Detail & Related papers (2022-01-21T09:01:12Z) - Hierarchical Heterogeneous Graph Representation Learning for Short Text
Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification.
First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs.
Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization
for Relation Extraction [111.74812895391672]
We propose a Knowledge-aware Prompt-tuning approach with synergistic optimization (KnowPrompt)
We inject latent knowledge contained in relation labels into prompt construction with learnable virtual type words and answer words.
arXiv Detail & Related papers (2021-04-15T17:57:43Z) - Knowledge-Base Enriched Word Embeddings for Biomedical Domain [5.086571902225929]
We propose a new word embedding based model for biomedical domain that jointly leverages the information from available corpora and domain knowledge.
Unlike existing approaches, the proposed methodology is simple but adept at capturing the precise knowledge available in domain resources in an accurate way.
arXiv Detail & Related papers (2021-02-20T18:18:51Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Benchmark and Best Practices for Biomedical Knowledge Graph Embeddings [8.835844347471626]
We train several state-of-the-art knowledge graph embedding models on the SNOMED-CT knowledge graph.
We make a case for the importance of leveraging the multi-relational nature of knowledge graphs for learning biomedical knowledge representation.
arXiv Detail & Related papers (2020-06-24T14:47:33Z) - Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain [1.3526604206343171]
Interpretability is a key means to justification which is an integral part when it comes to biomedical applications.
We present an inclusive study on interpretability of word embeddings in the medical domain, focusing on the role of sparse methods.
Based on our experiments, it is seen that sparse word vectors show far more interpretability while preserving the performance of their original vectors in downstream tasks.
arXiv Detail & Related papers (2020-05-11T13:56:58Z) - Distributional semantic modeling: a revised technique to train term/word
vector space models applying the ontology-related approach [36.248702416150124]
We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings)
Vec2graph is a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs.
arXiv Detail & Related papers (2020-03-06T18:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.