Patterns of Closeness and Abstractness in Colexifications: The Case of
Indigenous Languages in the Americas
- URL: http://arxiv.org/abs/2312.11069v1
- Date: Mon, 18 Dec 2023 10:06:50 GMT
- Title: Patterns of Closeness and Abstractness in Colexifications: The Case of
Indigenous Languages in the Americas
- Authors: Yiyi Chen, Johannes Bjerva
- Abstract summary: Colexification refers to linguistic phenomena where multiple concepts (meanings) are expressed by the same lexical form.
In this paper, we hypothesize that concepts that are closer in concreteness/abstractness are more likey to colexify, and we test the hypothesis across indigenous languages in Americas.
- Score: 3.7055269158186874
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Colexification refers to linguistic phenomena where multiple concepts
(meanings) are expressed by the same lexical form, such as polysemy or
homophony. Colexifications have been found to be pervasive across languages and
cultures. The problem of concreteness/abstractness of concepts is
interdisciplinary, studied from a cognitive standpoint in linguistics,
psychology, psycholinguistics, neurophysiology, etc. In this paper, we
hypothesize that concepts that are closer in concreteness/abstractness are more
likey to colexify, and we test the hypothesis across indigenous languages in
Americas.
Related papers
- Unsupervised Classification of English Words Based on Phonological Information: Discovery of Germanic and Latinate Clusters [9.220284665192663]
Cross-linguistically, native words and loanwords follow different phonological rules.
The Germanic-Latinate distinction in the English lexicon is learnable from the phonotactic information of individual words.
arXiv Detail & Related papers (2025-04-16T05:20:08Z) - A Grounded Typology of Word Classes [7.201565960962933]
Inspired by information theory, we define "groundedness", an empirical measure of semantic contentfulness.
Our measure captures the contentfulness asymmetry between functional (grammatical) and lexical (content) classes across languages.
We release a dataset of groundedness scores for 30 languages.
arXiv Detail & Related papers (2024-12-13T18:58:48Z) - Cross-Lingual and Cross-Cultural Variation in Image Descriptions [2.8664758928324883]
We conduct the first large-scale empirical study of cross-lingual variation in image descriptions.
We use a multimodal dataset with 31 languages and images from diverse locations.
Our analysis reveals that pairs of languages that are geographically or genetically closer tend to mention the same entities more frequently.
arXiv Detail & Related papers (2024-09-25T05:57:09Z) - Patterns of Persistence and Diffusibility across the World's Languages [3.7055269158186874]
Colexification is a type of similarity where a single lexical form is used to convey multiple meanings.
We shed light on the linguistic causes of cross-lingual similarity in colexification and phonology.
We construct large-scale graphs incorporating semantic, genealogical, phonological and geographical data for 1,966 languages.
arXiv Detail & Related papers (2024-01-03T12:05:38Z) - Colexifications for Bootstrapping Cross-lingual Datasets: The Case of
Phonology, Concreteness, and Affectiveness [6.790979602996742]
Colexification refers to the linguistic phenomenon where a single lexical form is used to convey multiple meanings.
We showcase curation procedures which result in a dataset covering 142 languages across 21 language families across the world.
The dataset includes ratings of concreteness and affectiveness, mapped with phonemes and phonological features.
arXiv Detail & Related papers (2023-06-05T07:32:21Z) - Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
Pre-trained Models [84.86942006830772]
We conjecture that multilingual pre-trained models can derive language-universal abstractions about grammar.
We conduct the first large-scale empirical study over 43 languages and 14 morphosyntactic categories with a state-of-the-art neuron-level probe.
arXiv Detail & Related papers (2022-05-04T12:22:31Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Conceptual similarity and communicative need shape colexification: an
experimental study [5.345468714252352]
Colexification refers to the phenomenon of multiple meanings sharing one word in a language.
communicative needs play an important role in shaping colexification patterns.
This research provides further evidence to support the argument that languages are shaped by the needs and preferences of their speakers.
arXiv Detail & Related papers (2021-03-19T21:18:16Z) - Constructing a Family Tree of Ten Indo-European Languages with
Delexicalized Cross-linguistic Transfer Patterns [57.86480614673034]
We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns.
This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of Second Language Acquisition.
arXiv Detail & Related papers (2020-07-17T15:56:54Z) - The Typology of Polysemy: A Multilingual Distributional Framework [6.753781783859273]
We present a novel framework that quantifies semantic affinity, the cross-linguistic similarity of lexical semantics for a concept.
Our results reveal an intricate interaction between semantic domains and extra-linguistic factors, beyond language phylogeny.
arXiv Detail & Related papers (2020-06-02T22:31:40Z) - Finding Universal Grammatical Relations in Multilingual BERT [47.74015366712623]
We show that subspaces of mBERT representations recover syntactic tree distances in languages other than English.
We present an unsupervised analysis method that provides evidence mBERT learns representations of syntactic dependency labels.
arXiv Detail & Related papers (2020-05-09T20:46:02Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.