Word Embeddings Are Capable of Capturing Rhythmic Similarity of Words
- URL: http://arxiv.org/abs/2204.04833v2
- Date: Thu, 14 Apr 2022 07:28:15 GMT
- Title: Word Embeddings Are Capable of Capturing Rhythmic Similarity of Words
- Authors: Hosein Rezaei
- Abstract summary: Word embedding systems such as Word2Vec and GloVe are well-known in deep learning approaches to NLP.
In this work we investigated their usefulness in capturing rhythmic similarity of words instead.
The results show that vectors these embeddings assign to rhyming words are more similar to each other, compared to the other words.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Word embedding systems such as Word2Vec and GloVe are well-known in deep
learning approaches to NLP. This is largely due to their ability to capture
semantic relationships between words. In this work we investigated their
usefulness in capturing rhythmic similarity of words instead. The results show
that vectors these embeddings assign to rhyming words are more similar to each
other, compared to the other words. It is also revealed that GloVe performs
relatively better than Word2Vec in this regard. We also proposed a first of its
kind metric for quantifying rhythmic similarity of a pair of words.
Related papers
- The Impact of Word Splitting on the Semantic Content of Contextualized
Word Representations [3.4668147567693453]
The quality of representations of words that are split is often, but not always, worse than that of the embeddings of known words.
Our analysis reveals, among other interesting findings, that the quality of representations of words that are split is often, but not always, worse than that of the embeddings of known words.
arXiv Detail & Related papers (2024-02-22T15:04:24Z) - Spoken Word2Vec: Learning Skipgram Embeddings from Speech [0.8901073744693314]
We show how shallow skipgram-like algorithms fail to encode distributional semantics when the input units are acoustically correlated.
We illustrate the potential of an alternative deep end-to-end variant of the model and examine the effects on the resulting embeddings.
arXiv Detail & Related papers (2023-11-15T19:25:29Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - Problems with Cosine as a Measure of Embedding Similarity for High
Frequency Words [45.58634797899206]
We find that cosine similarity underestimates the similarity of frequent words with other instances of the same word or other words across contexts.
We conjecture that this underestimation of similarity for high frequency words is due to differences in the representational geometry of high and low frequency words.
arXiv Detail & Related papers (2022-05-10T18:00:06Z) - Simple, Interpretable and Stable Method for Detecting Words with Usage
Change across Corpora [54.757845511368814]
The problem of comparing two bodies of text and searching for words that differ in their usage arises often in digital humanities and computational social science.
This is commonly approached by training word embeddings on each corpus, aligning the vector spaces, and looking for words whose cosine distance in the aligned space is large.
We propose an alternative approach that does not use vector space alignment, and instead considers the neighbors of each word.
arXiv Detail & Related papers (2021-12-28T23:46:00Z) - Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
Attribution [74.27826764855911]
We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts.
Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
arXiv Detail & Related papers (2021-10-27T06:25:31Z) - SemGloVe: Semantic Co-occurrences for GloVe from BERT [55.420035541274444]
GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices.
We propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings.
arXiv Detail & Related papers (2020-12-30T15:38:26Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - Comparative Analysis of Word Embeddings for Capturing Word Similarities [0.0]
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks.
Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings.
selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
arXiv Detail & Related papers (2020-05-08T01:16:03Z) - Dense Embeddings Preserving the Semantic Relationships in WordNet [2.9443230571766854]
We provide a novel way to generate low dimensional vector embeddings for noun and verb synsets in WordNet.
We call this embedding the Sense Spectrum (and Sense Spectra for embeddings)
In order to create suitable labels for the training of sense spectra, we designed a new similarity measurement for noun and verb synsets in WordNet.
arXiv Detail & Related papers (2020-04-22T21:09:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.