On the Impact of Knowledge-based Linguistic Annotations in the Quality
of Scientific Embeddings
- URL: http://arxiv.org/abs/2104.06200v1
- Date: Tue, 13 Apr 2021 13:51:22 GMT
- Title: On the Impact of Knowledge-based Linguistic Annotations in the Quality
of Scientific Embeddings
- Authors: Andres Garcia-Silva, Ronald Denaux, Jose Manuel Gomez-Perez
- Abstract summary: We conduct a study on the use of explicit linguistic annotations to generate embeddings from a scientific corpus.
Our results show how the effect of such annotations in the embeddings varies depending on the evaluation task.
In general, we observe that learning embeddings using linguistic annotations contributes to achieve better evaluation results.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In essence, embedding algorithms work by optimizing the distance between a
word and its usual context in order to generate an embedding space that encodes
the distributional representation of words. In addition to single words or word
pieces, other features which result from the linguistic analysis of text,
including lexical, grammatical and semantic information, can be used to improve
the quality of embedding spaces. However, until now we did not have a precise
understanding of the impact that such individual annotations and their possible
combinations may have in the quality of the embeddings. In this paper, we
conduct a comprehensive study on the use of explicit linguistic annotations to
generate embeddings from a scientific corpus and quantify their impact in the
resulting representations. Our results show how the effect of such annotations
in the embeddings varies depending on the evaluation task. In general, we
observe that learning embeddings using linguistic annotations contributes to
achieve better evaluation results.
Related papers
- Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - Compositional Generalization in Grounded Language Learning via Induced
Model Sparsity [81.38804205212425]
We consider simple language-conditioned navigation problems in a grid world environment with disentangled observations.
We design an agent that encourages sparse correlations between words in the instruction and attributes of objects, composing them together to find the goal.
Our agent maintains a high level of performance on goals containing novel combinations of properties even when learning from a handful of demonstrations.
arXiv Detail & Related papers (2022-07-06T08:46:27Z) - Human-in-the-Loop Refinement of Word Embeddings [0.0]
We propose a system that incorporates an adaptation of word embedding post-processing, which we call "interactive refitting"
Our approach allows a human to identify and address potential quality issues with word embeddings interactively.
It also allows for better insight into what effect word embeddings, and refinements to word embeddings, have on machine learning pipelines.
arXiv Detail & Related papers (2021-10-06T16:10:32Z) - An Empirical Study on Leveraging Position Embeddings for Target-oriented
Opinion Words Extraction [13.765146062545048]
Target-oriented opinion words extraction (TOWE) is a new subtask of target-oriented sentiment analysis.
We show that BiLSTM-based models can effectively encode position information into word representations.
We also adapt a graph convolutional network (GCN) to enhance word representations by incorporating syntactic information.
arXiv Detail & Related papers (2021-09-02T22:49:45Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - On the Effects of Knowledge-Augmented Data in Word Embeddings [0.6749750044497732]
We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings.
We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.
arXiv Detail & Related papers (2020-10-05T02:14:13Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Compass-aligned Distributional Embeddings for Studying Semantic
Differences across Corpora [14.993021283916008]
We present a framework to support cross-corpora language studies with word embeddings.
CADE is the core component of our framework and solves the key problem of aligning the embeddings generated from different corpora.
The results of our experiments suggest that CADE achieves state-of-the-art or superior performance on tasks where several competing approaches are available.
arXiv Detail & Related papers (2020-04-13T15:46:47Z) - A Common Semantic Space for Monolingual and Cross-Lingual
Meta-Embeddings [10.871587311621974]
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings.
Existing word vectors are projected to a common semantic space using linear transformations and averaging.
The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities.
arXiv Detail & Related papers (2020-01-17T15:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.