It Means More if It Sounds Good: Yet Another Hypothesis Concerning the
Evolution of Polysemous Words
- URL: http://arxiv.org/abs/2003.05758v2
- Date: Thu, 7 Jan 2021 11:51:44 GMT
- Title: It Means More if It Sounds Good: Yet Another Hypothesis Concerning the
Evolution of Polysemous Words
- Authors: Ivan P. Yamshchikov, Cyrille Merleau Nono Saha, Igor Samenko, J\"urgen
Jost
- Abstract summary: Using Ollivier-Ricci curvature over a large graph of synonyms to estimate polysemy it shows empirically that the words that arguably are easier to pronounce also tend to have multiple meanings.
- Score: 9.434133337939498
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This position paper looks into the formation of language and shows ties
between structural properties of the words in the English language and their
polysemy. Using Ollivier-Ricci curvature over a large graph of synonyms to
estimate polysemy it shows empirically that the words that arguably are easier
to pronounce also tend to have multiple meanings.
Related papers
- Analyzing Polysemy Evolution Using Semantic Cells [0.0]
This paper shows that word polysemy is an evolutionary consequence of the modification of Semantic Cells.
In particular, the analysis of a sentence sequence of 1000 sentences in some order for each of the four senses of the word Spring, collected using Chat GPT, shows that the word acquires the most polysemy monotonically.
arXiv Detail & Related papers (2024-07-23T00:52:12Z) - Correlation Does Not Imply Compensation: Complexity and Irregularity in the Lexicon [48.00488140516432]
We find evidence of a positive relationship between morphological irregularity and phonotactic complexity within languages.
We also find weak evidence of a negative relationship between word length and morphological irregularity.
arXiv Detail & Related papers (2024-06-07T18:09:21Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - Lost in Context? On the Sense-wise Variance of Contextualized Word
Embeddings [11.475144702935568]
We quantify how much the contextualized embeddings of each word sense vary across contexts in typical pre-trained models.
We find that word representations are position-biased, where the first words in different contexts tend to be more similar.
arXiv Detail & Related papers (2022-08-20T12:27:25Z) - Patterns of Lexical Ambiguity in Contextualised Language Models [9.747449805791092]
We introduce an extended, human-annotated dataset of graded word sense similarity and co-predication.
Both types of human judgements indicate that the similarity of polysemic interpretations falls in a continuum between identity of meaning and homonymy.
Our dataset appears to capture a substantial part of the complexity of lexical ambiguity, and can provide a realistic test bed for contextualised embeddings.
arXiv Detail & Related papers (2021-09-27T13:11:44Z) - Let's Play Mono-Poly: BERT Can Reveal Words' Polysemy Level and
Partitionability into Senses [4.915907527975786]
Pre-trained language models (LMs) encode rich information about linguistic structure but their knowledge about lexical polysemy remains unclear.
We propose a novel experimental setup for analysing this knowledge in LMs specifically trained for different languages.
We demonstrate that BERT-derived representations reflect words' polysemy level and their partitionability into senses.
arXiv Detail & Related papers (2021-04-29T23:15:13Z) - Disambiguatory Signals are Stronger in Word-initial Positions [48.18148856974974]
We point out the confounds in existing methods for comparing the informativeness of segments early in the word versus later in the word.
We find evidence across hundreds of languages that indeed there is a cross-linguistic tendency to front-load information in words.
arXiv Detail & Related papers (2021-02-03T18:19:16Z) - Topology of Word Embeddings: Singularities Reflect Polysemy [68.8204255655161]
We introduce a topological measure of polysemy based on persistent homology that correlates well with the actual number of meanings of a word.
We propose a simple, topologically motivated solution to the SemEval-2010 task on Word Sense Induction & Disambiguation.
arXiv Detail & Related papers (2020-11-18T17:21:51Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.