The Evolution of Language in Social Media Comments
- URL: http://arxiv.org/abs/2406.11450v2
- Date: Tue, 18 Jun 2024 06:30:13 GMT
- Title: The Evolution of Language in Social Media Comments
- Authors: Niccolò Di Marco, Edoardo Loru, Anita Bonetti, Alessandra Olga Grazia Serra, Matteo Cinelli, Walter Quattrociocchi,
- Abstract summary: This study investigates the linguistic characteristics of user comments over 34 years, focusing on their complexity and temporal shifts.
We utilize a dataset of approximately 300 million English comments from eight diverse platforms and topics.
Our findings reveal consistent patterns of complexity across social media platforms and topics, characterized by a nearly universal reduction in text length, diminished lexical richness, but decreased repetitiveness.
- Score: 37.69303106863453
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the impact of digital platforms on user behavior presents foundational challenges, including issues related to polarization, misinformation dynamics, and variation in news consumption. Comparative analyses across platforms and over different years can provide critical insights into these phenomena. This study investigates the linguistic characteristics of user comments over 34 years, focusing on their complexity and temporal shifts. Utilizing a dataset of approximately 300 million English comments from eight diverse platforms and topics, we examine the vocabulary size and linguistic richness of user communications and their evolution over time. Our findings reveal consistent patterns of complexity across social media platforms and topics, characterized by a nearly universal reduction in text length, diminished lexical richness, but decreased repetitiveness. Despite these trends, users consistently introduce new words into their comments at a nearly constant rate. This analysis underscores that platforms only partially influence the complexity of user comments. Instead, it reflects a broader, universal pattern of human behaviour, suggesting intrinsic linguistic tendencies of users when interacting online.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Inside the echo chamber: Linguistic underpinnings of misinformation on Twitter [4.62503518282081]
Social media users drive the spread of misinformation online by sharing posts that include erroneous information or commenting on controversial topics.
This work explores how conversations around misinformation are mediated through language use.
arXiv Detail & Related papers (2024-04-24T15:37:12Z) - Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Exploring Embeddings for Measuring Text Relatedness: Unveiling
Sentiments and Relationships in Online Comments [1.7230140898679147]
This paper investigates sentiment and semantic relationships among comments across various social media platforms.
It uses word embeddings to analyze components in sentences and documents.
Our analysis will enable a deeper understanding of the interconnectedness of online comments and will investigate the notion of the internet functioning as a large interconnected brain.
arXiv Detail & Related papers (2023-09-15T04:57:23Z) - Evolving linguistic divergence on polarizing social media [0.0]
We quantify divergence in topics of conversation and word frequencies, messaging sentiment, and lexical semantics of words and emoji.
While US American English remains largely intelligible within its large speech community, our findings point at areas where miscommunication may arise.
arXiv Detail & Related papers (2023-09-04T15:21:55Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Mental Disorders on Online Social Media Through the Lens of Language and
Behaviour: Analysis and Visualisation [7.133136338850781]
We study the factors that characterise and differentiate social media users affected by mental disorders.
Our findings reveal significant differences on the use of function words, such as adverbs and verb tense, and topic-specific vocabulary.
We find evidence suggesting that language use on micro-blogging platforms is less distinguishable for users who have a mental disorder.
arXiv Detail & Related papers (2022-02-07T15:29:01Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - How individuals change language [1.2437226707039446]
We introduce a very general mathematical model that encompasses a wide variety of individual-level linguistic behaviours.
We compare the likelihood of empirically-attested changes in definite and indefinite articles in multiple languages under different assumptions.
We find that accounts of language change that appeal primarily to errors in childhood language acquisition are very weakly supported by the historical data.
arXiv Detail & Related papers (2021-04-20T19:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.