Topology of Word Embeddings: Singularities Reflect Polysemy
- URL: http://arxiv.org/abs/2011.09413v1
- Date: Wed, 18 Nov 2020 17:21:51 GMT
- Title: Topology of Word Embeddings: Singularities Reflect Polysemy
- Authors: Alexander Jakubowski, Milica Ga\v{s}i\'c, Marcus Zibrowius
- Abstract summary: We introduce a topological measure of polysemy based on persistent homology that correlates well with the actual number of meanings of a word.
We propose a simple, topologically motivated solution to the SemEval-2010 task on Word Sense Induction & Disambiguation.
- Score: 68.8204255655161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The manifold hypothesis suggests that word vectors live on a submanifold
within their ambient vector space. We argue that we should, more accurately,
expect them to live on a pinched manifold: a singular quotient of a manifold
obtained by identifying some of its points. The identified, singular points
correspond to polysemous words, i.e. words with multiple meanings. Our point of
view suggests that monosemous and polysemous words can be distinguished based
on the topology of their neighbourhoods. We present two kinds of empirical
evidence to support this point of view: (1) We introduce a topological measure
of polysemy based on persistent homology that correlates well with the actual
number of meanings of a word. (2) We propose a simple, topologically motivated
solution to the SemEval-2010 task on Word Sense Induction & Disambiguation that
produces competitive results.
Related papers
- Analyzing Polysemy Evolution Using Semantic Cells [0.0]
This paper shows that word polysemy is an evolutionary consequence of the modification of Semantic Cells.
In particular, the analysis of a sentence sequence of 1000 sentences in some order for each of the four senses of the word Spring, collected using Chat GPT, shows that the word acquires the most polysemy monotonically.
arXiv Detail & Related papers (2024-07-23T00:52:12Z) - Domain Embeddings for Generating Complex Descriptions of Concepts in
Italian Language [65.268245109828]
We propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries.
The resource comprises 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface.
Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge.
arXiv Detail & Related papers (2024-02-26T15:04:35Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - Word-Embeddings Distinguish Denominal and Root-Derived Verbs in Semitic [0.0]
We propose to test the validity of the two-level hypothesis in the context of Hebrew word embeddings.
If the two-level hypothesis is borne out, we expect state-of-the-art Hebrew word embeddings to encode (1) a noun, (2) a denominal derived from it (via an upper-level operation), and (3) a verb related to the noun.
We report that this hypothesis is verified by four embedding models of Hebrew: fastText, GloVe, Word2Vec and AlephBERT.
arXiv Detail & Related papers (2022-08-11T09:31:37Z) - The Causal Structure of Semantic Ambiguities [0.0]
We identify two features: (1) joint plausibility degrees of different possible interpretations, and (2) causal structures according to which certain words play a more substantial role in the processes.
We applied this theory to a dataset of ambiguous phrases extracted from Psycholinguistics literature and their human plausibility collected by us.
arXiv Detail & Related papers (2022-06-14T12:56:34Z) - Patterns of Lexical Ambiguity in Contextualised Language Models [9.747449805791092]
We introduce an extended, human-annotated dataset of graded word sense similarity and co-predication.
Both types of human judgements indicate that the similarity of polysemic interpretations falls in a continuum between identity of meaning and homonymy.
Our dataset appears to capture a substantial part of the complexity of lexical ambiguity, and can provide a realistic test bed for contextualised embeddings.
arXiv Detail & Related papers (2021-09-27T13:11:44Z) - SemGloVe: Semantic Co-occurrences for GloVe from BERT [55.420035541274444]
GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices.
We propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings.
arXiv Detail & Related papers (2020-12-30T15:38:26Z) - SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in
BERT-based Embedding Spaces [63.17308641484404]
We propose to identify clusters among different occurrences of each target word, considering these as representatives of different word meanings.
Disagreements in obtained clusters naturally allow to quantify the level of semantic shift per each target word in four target languages.
Our approach performs well both measured separately (per language) and overall, where we surpass all provided SemEval baselines.
arXiv Detail & Related papers (2020-10-02T08:38:40Z) - It Means More if It Sounds Good: Yet Another Hypothesis Concerning the
Evolution of Polysemous Words [9.434133337939498]
Using Ollivier-Ricci curvature over a large graph of synonyms to estimate polysemy it shows empirically that the words that arguably are easier to pronounce also tend to have multiple meanings.
arXiv Detail & Related papers (2020-03-12T12:55:50Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.