Related papers: Evolving linguistic divergence on polarizing social media

Evolving linguistic divergence on polarizing social media

URL: http://arxiv.org/abs/2309.01659v1
Date: Mon, 4 Sep 2023 15:21:55 GMT
Title: Evolving linguistic divergence on polarizing social media
Authors: Andres Karjus, Christine Cuskley
Abstract summary: We quantify divergence in topics of conversation and word frequencies, messaging sentiment, and lexical semantics of words and emoji. While US American English remains largely intelligible within its large speech community, our findings point at areas where miscommunication may arise.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language change is influenced by many factors, but often starts from synchronic variation, where multiple linguistic patterns or forms coexist, or where different speech communities use language in increasingly different ways. Besides regional or economic reasons, communities may form and segregate based on political alignment. The latter, referred to as political polarization, is of growing societal concern across the world. Here we map and quantify linguistic divergence across the partisan left-right divide in the United States, using social media data. We develop a general methodology to delineate (social) media users by their political preference, based on which (potentially biased) news media accounts they do and do not follow on a given platform. Our data consists of 1.5M short posts by 10k users (about 20M words) from the social media platform Twitter (now "X"). Delineating this sample involved mining the platform for the lists of followers (n=422M) of 72 large news media accounts. We quantify divergence in topics of conversation and word frequencies, messaging sentiment, and lexical semantics of words and emoji. We find signs of linguistic divergence across all these aspects, especially in topics and themes of conversation, in line with previous research. While US American English remains largely intelligible within its large speech community, our findings point at areas where miscommunication may eventually arise given ongoing polarization and therefore potential linguistic divergence. Our methodology - combining data mining, lexicostatistics, machine learning, large language models and a systematic human annotation approach - is largely language and platform agnostic. In other words, while we focus here on US political divides and US English, the same approach is applicable to other countries, languages, and social media platforms.

Related papers

Detecting Linguistic Diversity on Social Media [1.3108652488669732]
We use published census data as the ground truth and the social media sub-corpus from the Corpus of Global Language Use as our alternative data source. We identify the language conditions of each tweet in the social media data set and validated our results with two language identification models. The results suggest that social media language data has the possibility to provide a rich source of spatial and temporal insights on the linguistic profile of a place.
arXiv Detail & Related papers (2025-02-28T16:56:34Z)
Characterizing the Fragmentation of the Social Media Ecosystem [39.58317527488534]
We use a dataset of 126M URLs posted by nearly 6M users on nine social media platforms. We find a clear separation between mainstream and alt-tech platforms. These findings outline the main dimensions defining the fragmentation and polarization of the social media ecosystem.
arXiv Detail & Related papers (2024-11-25T18:45:03Z)
The Evolution of Language in Social Media Comments [37.69303106863453]
This study investigates the linguistic characteristics of user comments over 34 years, focusing on their complexity and temporal shifts. We utilize a dataset of approximately 300 million English comments from eight diverse platforms and topics. Our findings reveal consistent patterns of complexity across social media platforms and topics, characterized by a nearly universal reduction in text length, diminished lexical richness, but decreased repetitiveness.
arXiv Detail & Related papers (2024-06-17T12:03:30Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
Moral consensus and divergence in partisan language use [0.0]
Polarization has increased substantially in political discourse, contributing to a widening partisan divide. We analyzed large-scale, real-world language use in Reddit communities and in news outlets to uncover psychological dimensions along which partisan language is divided.
arXiv Detail & Related papers (2023-10-14T16:50:26Z)
Language statistics at different spatial, temporal, and grammatical scales [48.7576911714538]
We use data from Twitter to explore the rank diversity at different scales. The greatest changes come from variations in the grammatical scale. As the grammatical scale grows, the rank diversity curves vary more depending on the temporal and spatial scales.
arXiv Detail & Related papers (2022-07-02T01:38:48Z)
Reaching the bubble may not be enough: news media role in online political polarization [58.720142291102135]
A way of reducing polarization would be by distributing cross-partisan news among individuals with distinct political orientations. This study investigates whether this holds in the context of nationwide elections in Brazil and Canada.
arXiv Detail & Related papers (2021-09-18T11:34:04Z)
Revealing Persona Biases in Dialogue Systems [64.96908171646808]
We present the first large-scale study on persona biases in dialogue systems. We conduct analyses on personas of different social classes, sexual orientations, races, and genders. In our studies of the Blender and DialoGPT dialogue systems, we show that the choice of personas can affect the degree of harms in generated responses.
arXiv Detail & Related papers (2021-04-18T05:44:41Z)
Exploring Polarization of Users Behavior on Twitter During the 2019 South American Protests [15.065938163384235]
We explore polarization on Twitter in a different context, namely the protest that paralyzed several countries in the South American region in 2019. By leveraging users' endorsement of politicians' tweets and hashtag campaigns with defined stances towards the protest (for or against), we construct a weakly labeled stance dataset with millions of users. We find empirical evidence of the "filter bubble" phenomenon during the event, as we not only show that the user bases are homogeneous in terms of stance, but the probability that a user transitions from media of different clusters is low.
arXiv Detail & Related papers (2021-04-05T07:13:18Z)
Discovering and Categorising Language Biases in Reddit [5.670038395203354]
This paper proposes a data-driven approach to automatically discover language biases encoded in the vocabulary of online discourse communities on Reddit. We use word embeddings to transform text into high-dimensional dense vectors and capture semantic relations between words. We successfully discover gender bias, religion bias, and ethnic bias in different Reddit communities.
arXiv Detail & Related papers (2020-08-06T16:42:10Z)
Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms. We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features. We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.