Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods
- URL: http://arxiv.org/abs/2001.07740v1
- Date: Tue, 21 Jan 2020 19:09:49 GMT
- Title: Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods
- Authors: Maria Ryskina, Ella Rabinovich, Taylor Berg-Kirkpatrick, David R.
Mortensen, Yulia Tsvetkov
- Abstract summary: We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
- Score: 51.34667808471513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We perform statistical analysis of the phenomenon of neology, the process by
which new words emerge in a language, using large diachronic corpora of
English. We investigate the importance of two factors, semantic sparsity and
frequency growth rates of semantic neighbors, formalized in the distributional
semantics paradigm. We show that both factors are predictive of word emergence
although we find more support for the latter hypothesis. Besides presenting a
new linguistic application of distributional semantics, this study tackles the
linguistic question of the role of language-internal factors (in our case,
sparsity) in language change motivated by language-external factors (reflected
in frequency growth).
Related papers
- Patterns of Persistence and Diffusibility across the World's Languages [3.7055269158186874]
Colexification is a type of similarity where a single lexical form is used to convey multiple meanings.
We shed light on the linguistic causes of cross-lingual similarity in colexification and phonology.
We construct large-scale graphs incorporating semantic, genealogical, phonological and geographical data for 1,966 languages.
arXiv Detail & Related papers (2024-01-03T12:05:38Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive [77.71680953280436]
The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics.
The semantic properties considered are telicity (also combined with definiteness) and agentivity.
arXiv Detail & Related papers (2023-07-06T10:52:22Z) - Subdiffusive semantic evolution in Indo-European languages [0.0]
We find that semantic evolution is strongly subdiffusive across five major Indo-European languages.
We show that words follow trajectories in meaning space with an anomalous diffusion exponent.
We furthermore show that strong subdiffusion is a robust phenomenon under a wide variety of choices in data analysis and interpretation.
arXiv Detail & Related papers (2022-09-10T15:57:32Z) - The Causal Structure of Semantic Ambiguities [0.0]
We identify two features: (1) joint plausibility degrees of different possible interpretations, and (2) causal structures according to which certain words play a more substantial role in the processes.
We applied this theory to a dataset of ambiguous phrases extracted from Psycholinguistics literature and their human plausibility collected by us.
arXiv Detail & Related papers (2022-06-14T12:56:34Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Quantifying Cognitive Factors in Lexical Decline [2.4424095531386234]
We propose a variety of psycholinguistic factors -- semantic, distributional, and phonological -- that we hypothesize are predictive of lexical decline.
We find that most of our proposed factors show a significant difference in the expected direction between each curated set of declining words and their matched stable words.
Further diachronic analysis reveals that declining words tend to decrease in the diversity of their lexical contexts over time, gradually narrowing their 'ecological niches'
arXiv Detail & Related papers (2021-10-12T07:12:56Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Decomposing lexical and compositional syntax and semantics with deep
language models [82.81964713263483]
The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension.
Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four classes: lexical, compositional, syntactic, and semantic representations.
The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices.
arXiv Detail & Related papers (2021-03-02T10:24:05Z) - The Typology of Polysemy: A Multilingual Distributional Framework [6.753781783859273]
We present a novel framework that quantifies semantic affinity, the cross-linguistic similarity of lexical semantics for a concept.
Our results reveal an intricate interaction between semantic domains and extra-linguistic factors, beyond language phylogeny.
arXiv Detail & Related papers (2020-06-02T22:31:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.