Semantic Relatedness and Taxonomic Word Embeddings
- URL: http://arxiv.org/abs/2002.06235v1
- Date: Fri, 14 Feb 2020 20:02:11 GMT
- Title: Semantic Relatedness and Taxonomic Word Embeddings
- Authors: Magdalena Kacmajor and John D. Kelleher and Filip Klubicka and Alfredo
Maldonado
- Abstract summary: We show that there are different types of semantic relatedness and that different lexical representations encode different forms of relatedness.
We present experiments that analyse taxonomic embeddings that have been trained on a synthetic corpus that has been generated via a random walk over a taxonomy.
We explore the interactions between the relative sizes of natural and synthetic corpora on the performance of embeddings when taxonomic and thematic embeddings are combined.
- Score: 2.47944699884651
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper connects a series of papers dealing with taxonomic word
embeddings. It begins by noting that there are different types of semantic
relatedness and that different lexical representations encode different forms
of relatedness. A particularly important distinction within semantic
relatedness is that of thematic versus taxonomic relatedness. Next, we present
a number of experiments that analyse taxonomic embeddings that have been
trained on a synthetic corpus that has been generated via a random walk over a
taxonomy. These experiments demonstrate how the properties of the synthetic
corpus, such as the percentage of rare words, are affected by the shape of the
knowledge graph the corpus is generated from. Finally, we explore the
interactions between the relative sizes of natural and synthetic corpora on the
performance of embeddings when taxonomic and thematic embeddings are combined.
Related papers
- Entropy and type-token ratio in gigaword corpora [0.0]
We investigate entropy and text-token ratio, two metrics for lexical diversities, in six massive linguistic datasets in English, Spanish, and Turkish.
We find a functional relation between entropy and text-token ratio that holds across the corpora under consideration.
Our results contribute to the theoretical understanding of text structure and offer practical implications for fields like natural language processing.
arXiv Detail & Related papers (2024-11-15T14:40:59Z) - Estimating the Influence of Sequentially Correlated Literary Properties in Textual Classification: A Data-Centric Hypothesis-Testing Approach [4.161155428666988]
Stylometry aims to distinguish authors by analyzing literary traits assumed to reflect semi-conscious choices distinct from elements like genre or theme.
While some literary properties, such as thematic content, are likely to manifest as correlations between adjacent text units, others, like authorial style, may be independent thereof.
We introduce a hypothesis-testing approach to evaluate the influence of sequentially correlated literary properties on text classification.
arXiv Detail & Related papers (2024-11-07T18:28:40Z) - Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding.
We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations.
We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z) - Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive [77.71680953280436]
The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics.
The semantic properties considered are telicity (also combined with definiteness) and agentivity.
arXiv Detail & Related papers (2023-07-06T10:52:22Z) - Probing Taxonomic and Thematic Embeddings for Taxonomic Information [2.9874726192215157]
Modelling taxonomic and thematic relatedness is important for building AI with comprehensive natural language understanding.
We design a new hypernym-hyponym probing task and perform a comparative probing study of taxonomic and thematic SGNS and GloVe embeddings.
Experiments indicate that both types of embeddings encode some taxonomic information, but the amount, as well as the geometric properties of the encodings, are independently related to both the encoder architecture and the embedding training data.
arXiv Detail & Related papers (2023-01-25T15:59:26Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Quantifying Synthesis and Fusion and their Impact on Machine Translation [79.61874492642691]
In Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative.
In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level.
For computing literature, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study.
Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish,
arXiv Detail & Related papers (2022-05-06T17:04:58Z) - Decomposing lexical and compositional syntax and semantics with deep
language models [82.81964713263483]
The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension.
Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four classes: lexical, compositional, syntactic, and semantic representations.
The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices.
arXiv Detail & Related papers (2021-03-02T10:24:05Z) - Comparative Probing of Lexical Semantics Theories for Cognitive
Plausibility and Technological Usefulness [1.028961895672321]
Lexical semantics theories differ in advocating that the meaning of words is represented as an inference graph, a feature mapping or a vector space.
We systematically probe different lexical semantics theories for their levels of cognitive plausibility and of technological usefulness.
arXiv Detail & Related papers (2020-11-16T14:46:08Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Exploiting Non-Taxonomic Relations for Measuring Semantic Similarity and
Relatedness in WordNet [0.0]
This paper explores the benefits of using all types of non-taxonomic relations in large linked data, such as WordNet knowledge graph.
We propose a holistic poly-relational approach based on a new relation-based information content and non-taxonomic-based weighted paths.
arXiv Detail & Related papers (2020-06-22T09:59:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.