The Typology of Polysemy: A Multilingual Distributional Framework
- URL: http://arxiv.org/abs/2006.01966v1
- Date: Tue, 2 Jun 2020 22:31:40 GMT
- Title: The Typology of Polysemy: A Multilingual Distributional Framework
- Authors: Ella Rabinovich, Yang Xu, Suzanne Stevenson
- Abstract summary: We present a novel framework that quantifies semantic affinity, the cross-linguistic similarity of lexical semantics for a concept.
Our results reveal an intricate interaction between semantic domains and extra-linguistic factors, beyond language phylogeny.
- Score: 6.753781783859273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lexical semantic typology has identified important cross-linguistic
generalizations about the variation and commonalities in polysemy
patterns---how languages package up meanings into words. Recent computational
research has enabled investigation of lexical semantics at a much larger scale,
but little work has explored lexical typology across semantic domains, nor the
factors that influence cross-linguistic similarities. We present a novel
computational framework that quantifies semantic affinity, the cross-linguistic
similarity of lexical semantics for a concept. Our approach defines a common
multilingual semantic space that enables a direct comparison of the lexical
expression of concepts across languages. We validate our framework against
empirical findings on lexical semantic typology at both the concept and domain
levels. Our results reveal an intricate interaction between semantic domains
and extra-linguistic factors, beyond language phylogeny, that co-shape the
typology of polysemy across languages.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - The Acquisition of Semantic Relationships between words [0.0]
The study of semantic relationships has revealed a close connection between semantic relationships and the morphological characteristics of a language.
By delving into the relationship between semantic relationships and language morphology, we can gain deeper insights into how the underlying structure of words contributes to the interpretation and comprehension of language.
arXiv Detail & Related papers (2023-07-12T19:18:55Z) - Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive [77.71680953280436]
The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics.
The semantic properties considered are telicity (also combined with definiteness) and agentivity.
arXiv Detail & Related papers (2023-07-06T10:52:22Z) - A study of conceptual language similarity: comparison and evaluation [0.3093890460224435]
An interesting line of research in natural language processing (NLP) aims to incorporate linguistic typology to bridge linguistic diversity.
Recent work has introduced a novel approach to defining language similarity based on how they represent basic concepts.
In this work, we study the conceptual similarity in detail and evaluate it extensively on a binary classification task.
arXiv Detail & Related papers (2023-05-22T18:28:02Z) - Quantifying Synthesis and Fusion and their Impact on Machine Translation [79.61874492642691]
In Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative.
In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level.
For computing literature, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study.
Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish,
arXiv Detail & Related papers (2022-05-06T17:04:58Z) - A cognitively driven weighted-entropy model for embedding semantic
categories in hyperbolic geometry [0.0]
An unsupervised and cognitively driven weighted-entropy method for embedding semantic categories in hyperbolic geometry is proposed.
The model is driven by two fields of research in cognitive linguistics: the statistical learning theory of language acquisition and the proposal of using high-dimensional networks to represent semantic knowledge in cognition.
Results show that this new approach can successfully model and map the semantic relationships of popularity and similarity for most of the basic color and kinship words in English.
arXiv Detail & Related papers (2021-12-13T18:33:45Z) - Patterns of Lexical Ambiguity in Contextualised Language Models [9.747449805791092]
We introduce an extended, human-annotated dataset of graded word sense similarity and co-predication.
Both types of human judgements indicate that the similarity of polysemic interpretations falls in a continuum between identity of meaning and homonymy.
Our dataset appears to capture a substantial part of the complexity of lexical ambiguity, and can provide a realistic test bed for contextualised embeddings.
arXiv Detail & Related papers (2021-09-27T13:11:44Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.