From sunblock to softblock: Analyzing the correlates of neology in published writing and on social media
- URL: http://arxiv.org/abs/2602.13123v1
- Date: Fri, 13 Feb 2026 17:19:28 GMT
- Title: From sunblock to softblock: Analyzing the correlates of neology in published writing and on social media
- Authors: Maria Ryskina, Matthew R. Gormley, Kyle Mahowald, David R. Mortensen, Taylor Berg-Kirkpatrick, Vivek Kulkarni,
- Abstract summary: We show that the topic popularity growth factor may contribute less to neology on Twitter than in published writing.<n>We hypothesize that this difference can be explained by the two domains favouring different neologism formation mechanisms.
- Score: 43.51412860640503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Living languages are shaped by a host of conflicting internal and external evolutionary pressures. While some of these pressures are universal across languages and cultures, others differ depending on the social and conversational context: language use in newspapers is subject to very different constraints than language use on social media. Prior distributional semantic work on English word emergence (neology) identified two factors correlated with creation of new words by analyzing a corpus consisting primarily of historical published texts (Ryskina et al., 2020, arXiv:2001.07740). Extending this methodology to contextual embeddings in addition to static ones and applying it to a new corpus of Twitter posts, we show that the same findings hold for both domains, though the topic popularity growth factor may contribute less to neology on Twitter than in published writing. We hypothesize that this difference can be explained by the two domains favouring different neologism formation mechanisms.
Related papers
- Entropy and type-token ratio in gigaword corpora [0.0]
lexical diversity is characterized in terms of the type-token ratio and the word entropy.<n>We investigate both diversity metrics in six massive linguistic datasets in English, Spanish, and Turkish.<n>We unveil an empirical functional relation between entropy and type-token ratio of texts of a given corpus and language.
arXiv Detail & Related papers (2024-11-15T14:40:59Z) - Evolving linguistic divergence on polarizing social media [0.0]
We quantify divergence in topics of conversation and word frequencies, messaging sentiment, and lexical semantics of words and emoji.
While US American English remains largely intelligible within its large speech community, our findings point at areas where miscommunication may arise.
arXiv Detail & Related papers (2023-09-04T15:21:55Z) - CommunityFish: A Poisson-based Document Scaling With Hierarchical
Clustering [0.0]
This paper introduces CommunityFish as an augmented version of Wordfish based on a hierarchical clustering, namely the Louvain algorithm, on the word space to yield communities as semantic and independent n-grams emerging from the corpus.
This strategy emphasizes the interpretability of the results, since communities have a non-overlapping structure, hence a crucial informative power in discriminating parties or speakers, in addition to allowing a faster execution of the Poisson scaling model.
arXiv Detail & Related papers (2023-08-28T19:52:18Z) - The Causal Structure of Semantic Ambiguities [0.0]
We identify two features: (1) joint plausibility degrees of different possible interpretations, and (2) causal structures according to which certain words play a more substantial role in the processes.
We applied this theory to a dataset of ambiguous phrases extracted from Psycholinguistics literature and their human plausibility collected by us.
arXiv Detail & Related papers (2022-06-14T12:56:34Z) - An Informational Space Based Semantic Analysis for Scientific Texts [62.997667081978825]
This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts.
The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties.
The research in this paper conducts the base for the geometric representation of the meaning of texts.
arXiv Detail & Related papers (2022-05-31T11:19:32Z) - Quantifying Synthesis and Fusion and their Impact on Machine Translation [79.61874492642691]
In Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative.
In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level.
For computing literature, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study.
Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish,
arXiv Detail & Related papers (2022-05-06T17:04:58Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Lexical semantic change for Ancient Greek and Latin [61.69697586178796]
Associating a word's correct meaning in its historical context is a central challenge in diachronic research.
We build on a recent computational approach to semantic change based on a dynamic Bayesian mixture model.
We provide a systematic comparison of dynamic Bayesian mixture models for semantic change with state-of-the-art embedding-based models.
arXiv Detail & Related papers (2021-01-22T12:04:08Z) - Learning language variations in news corpora through differential
embeddings [0.0]
We show that a model with a central word representation and a slice-dependent contribution can learn word embeddings from different corpora simultaneously.
We show that it can capture both temporal dynamics in the yearly slices of each corpus, and language variations between US and UK English in a curated multi-source corpus.
arXiv Detail & Related papers (2020-11-13T14:50:08Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.