On the Effects of Knowledge-Augmented Data in Word Embeddings
- URL: http://arxiv.org/abs/2010.01745v1
- Date: Mon, 5 Oct 2020 02:14:13 GMT
- Title: On the Effects of Knowledge-Augmented Data in Word Embeddings
- Authors: Diego Ramirez-Echavarria, Antonis Bikakis, Luke Dickens, Rob Miller,
Andreas Vlachidis
- Abstract summary: We propose a novel approach for linguistic knowledge injection through data augmentation to learn word embeddings.
We show our knowledge augmentation approach improves the intrinsic characteristics of the learned embeddings while not significantly altering their results on a downstream text classification task.
- Score: 0.6749750044497732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates techniques for knowledge injection into word
embeddings learned from large corpora of unannotated data. These
representations are trained with word cooccurrence statistics and do not
commonly exploit syntactic and semantic information from linguistic knowledge
bases, which potentially limits their transferability to domains with differing
language distributions or usages. We propose a novel approach for linguistic
knowledge injection through data augmentation to learn word embeddings that
enforce semantic relationships from the data, and systematically evaluate the
impact it has on the resulting representations. We show our knowledge
augmentation approach improves the intrinsic characteristics of the learned
embeddings while not significantly altering their results on a downstream text
classification task.
Related papers
- Capturing Pertinent Symbolic Features for Enhanced Content-Based
Misinformation Detection [0.0]
The detection of misleading content presents a significant hurdle due to its extreme linguistic and domain variability.
This paper analyzes the linguistic attributes that characterize this phenomenon and how representative of such features some of the most popular misinformation datasets are.
We demonstrate that the appropriate use of pertinent symbolic knowledge in combination with neural language models is helpful in detecting misleading content.
arXiv Detail & Related papers (2024-01-29T16:42:34Z) - Enhancing Context Through Contrast [0.4068270792140993]
We propose a novel Context Enhancement step to improve performance on neural machine translation.
Unlike other approaches, we do not explicitly augment the data but view languages as implicit augmentations.
Our method does not learn embeddings from scratch and can be generalised to any set of pre-trained embeddings.
arXiv Detail & Related papers (2024-01-06T22:13:51Z) - SememeASR: Boosting Performance of End-to-End Speech Recognition against
Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge [58.979490858061745]
We introduce sememe-based semantic knowledge information to speech recognition.
Our experiments show that sememe information can improve the effectiveness of speech recognition.
In addition, our further experiments show that sememe knowledge can improve the model's recognition of long-tailed data.
arXiv Detail & Related papers (2023-09-04T08:35:05Z) - Joint Language Semantic and Structure Embedding for Knowledge Graph
Completion [66.15933600765835]
We propose to jointly embed the semantics in the natural language description of the knowledge triplets with their structure information.
Our method embeds knowledge graphs for the completion task via fine-tuning pre-trained language models.
Our experiments on a variety of knowledge graph benchmarks have demonstrated the state-of-the-art performance of our method.
arXiv Detail & Related papers (2022-09-19T02:41:02Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Semantic TrueLearn: Using Semantic Knowledge Graphs in Recommendation
Systems [22.387120578306277]
This work aims to advance towards building a state-aware educational recommendation system that incorporates semantic relatedness.
We introduce a novel learner model that exploits this semantic relatedness between knowledge components in learning resources using the Wikipedia link graph.
Our experiments with a large dataset demonstrate that this new semantic version of TrueLearn algorithm achieves statistically significant improvements in terms of predictive performance.
arXiv Detail & Related papers (2021-12-08T16:23:27Z) - On the Impact of Knowledge-based Linguistic Annotations in the Quality
of Scientific Embeddings [0.0]
We conduct a study on the use of explicit linguistic annotations to generate embeddings from a scientific corpus.
Our results show how the effect of such annotations in the embeddings varies depending on the evaluation task.
In general, we observe that learning embeddings using linguistic annotations contributes to achieve better evaluation results.
arXiv Detail & Related papers (2021-04-13T13:51:22Z) - Knowledge-Base Enriched Word Embeddings for Biomedical Domain [5.086571902225929]
We propose a new word embedding based model for biomedical domain that jointly leverages the information from available corpora and domain knowledge.
Unlike existing approaches, the proposed methodology is simple but adept at capturing the precise knowledge available in domain resources in an accurate way.
arXiv Detail & Related papers (2021-02-20T18:18:51Z) - CoLAKE: Contextualized Language and Knowledge Embedding [81.90416952762803]
We propose the Contextualized Language and Knowledge Embedding (CoLAKE)
CoLAKE jointly learns contextualized representation for both language and knowledge with the extended objective.
We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks.
arXiv Detail & Related papers (2020-10-01T11:39:32Z) - Improving Machine Reading Comprehension with Contextualized Commonsense
Knowledge [62.46091695615262]
We aim to extract commonsense knowledge to improve machine reading comprehension.
We propose to represent relations implicitly by situating structured knowledge in a context.
We employ a teacher-student paradigm to inject multiple types of contextualized knowledge into a student machine reader.
arXiv Detail & Related papers (2020-09-12T17:20:01Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.