Subdiffusive semantic evolution in Indo-European languages
- URL: http://arxiv.org/abs/2209.04701v1
- Date: Sat, 10 Sep 2022 15:57:32 GMT
- Title: Subdiffusive semantic evolution in Indo-European languages
- Authors: Bogd\'an Asztalos, Gergely Palla, D\'aniel Cz\'egel
- Abstract summary: We find that semantic evolution is strongly subdiffusive across five major Indo-European languages.
We show that words follow trajectories in meaning space with an anomalous diffusion exponent.
We furthermore show that strong subdiffusion is a robust phenomenon under a wide variety of choices in data analysis and interpretation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How do words change their meaning? Although semantic evolution is driven by a
variety of distinct factors, including linguistic, societal, and technological
ones, we find that there is one law that holds universally across five major
Indo-European languages: that semantic evolution is strongly subdiffusive.
Using an automated pipeline of diachronic distributional semantic embedding
that controls for underlying symmetries, we show that words follow stochastic
trajectories in meaning space with an anomalous diffusion exponent $\alpha=
0.45\pm 0.05$ across languages, in contrast with diffusing particles that
follow $\alpha=1$. Randomization methods indicate that preserving temporal
correlations in semantic change directions is necessary to recover strongly
subdiffusive behavior; however, correlations in change sizes play an important
role too. We furthermore show that strong subdiffusion is a robust phenomenon
under a wide variety of choices in data analysis and interpretation, such as
the choice of fitting an ensemble average of displacements or averaging
best-fit exponents of individual word trajectories.
Related papers
- Semantic Cells: Evolutional Process to Acquire Sense Diversity of Items [0.0]
Author presents a method in which a word or item embraces multiple semantic vectors that evolve via interaction with others.
We obtained two preliminary results: the role of a word that evolves to acquire the largest or lower-middle variance of semantic vectors tends to be explainable.
The epicenters of earthquakes that acquire larger variance via crossover, corresponding to the interaction with diverse areas of land crust, are likely to correspond to the epicenters of forthcoming large earthquakes.
arXiv Detail & Related papers (2024-04-23T05:11:08Z) - Unsupervised Semantic Variation Prediction using the Distribution of
Sibling Embeddings [17.803726860514193]
Detection of semantic variation of words is an important task for various NLP applications.
We argue that mean representations alone cannot accurately capture such semantic variations.
We propose a method that uses the entire cohort of the contextualised embeddings of the target word.
arXiv Detail & Related papers (2023-05-15T13:58:21Z) - Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer? [50.48082721476612]
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
arXiv Detail & Related papers (2022-12-21T09:44:08Z) - Crosslinguistic word order variation reflects evolutionary pressures of
dependency and information locality [4.869029215261254]
About 40% of the world's languages have subject-verb-object order, and about 40% have subject-object-verb order.
We show that variation in word order reflects different ways of balancing competing pressures of dependency locality and information locality.
Our findings suggest that syntactic structure and usage across languages co-adapt to support efficient communication under limited cognitive resources.
arXiv Detail & Related papers (2022-06-09T02:56:53Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Grammatical Profiling for Semantic Change Detection [6.3596637237946725]
We use grammatical profiling as an alternative method for semantic change detection.
We demonstrate that it can be used for semantic change detection and even outperforms some distributional semantic methods.
arXiv Detail & Related papers (2021-09-21T18:38:18Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Investigating Cross-Linguistic Adjective Ordering Tendencies with a
Latent-Variable Model [66.84264870118723]
We present the first purely corpus-driven model of multi-lingual adjective ordering in the form of a latent-variable model.
We provide strong converging evidence for the existence of universal, cross-linguistic, hierarchical adjective ordering tendencies.
arXiv Detail & Related papers (2020-10-09T18:27:55Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.