Evolving linguistic divergence on polarizing social media
- URL: http://arxiv.org/abs/2309.01659v1
- Date: Mon, 4 Sep 2023 15:21:55 GMT
- Title: Evolving linguistic divergence on polarizing social media
- Authors: Andres Karjus, Christine Cuskley
- Abstract summary: We quantify divergence in topics of conversation and word frequencies, messaging sentiment, and lexical semantics of words and emoji.
While US American English remains largely intelligible within its large speech community, our findings point at areas where miscommunication may arise.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language change is influenced by many factors, but often starts from
synchronic variation, where multiple linguistic patterns or forms coexist, or
where different speech communities use language in increasingly different ways.
Besides regional or economic reasons, communities may form and segregate based
on political alignment. The latter, referred to as political polarization, is
of growing societal concern across the world. Here we map and quantify
linguistic divergence across the partisan left-right divide in the United
States, using social media data. We develop a general methodology to delineate
(social) media users by their political preference, based on which (potentially
biased) news media accounts they do and do not follow on a given platform. Our
data consists of 1.5M short posts by 10k users (about 20M words) from the
social media platform Twitter (now "X"). Delineating this sample involved
mining the platform for the lists of followers (n=422M) of 72 large news media
accounts. We quantify divergence in topics of conversation and word
frequencies, messaging sentiment, and lexical semantics of words and emoji. We
find signs of linguistic divergence across all these aspects, especially in
topics and themes of conversation, in line with previous research. While US
American English remains largely intelligible within its large speech
community, our findings point at areas where miscommunication may eventually
arise given ongoing polarization and therefore potential linguistic divergence.
Our methodology - combining data mining, lexicostatistics, machine learning,
large language models and a systematic human annotation approach - is largely
language and platform agnostic. In other words, while we focus here on US
political divides and US English, the same approach is applicable to other
countries, languages, and social media platforms.
Related papers
- Characterizing the Fragmentation of the Social Media Ecosystem [39.58317527488534]
We use a dataset of 126M URLs posted by nearly 6M users on nine social media platforms.
We find a clear separation between mainstream and alt-tech platforms.
These findings outline the main dimensions defining the fragmentation and polarization of the social media ecosystem.
arXiv Detail & Related papers (2024-11-25T18:45:03Z) - The Evolution of Language in Social Media Comments [37.69303106863453]
This study investigates the linguistic characteristics of user comments over 34 years, focusing on their complexity and temporal shifts.
We utilize a dataset of approximately 300 million English comments from eight diverse platforms and topics.
Our findings reveal consistent patterns of complexity across social media platforms and topics, characterized by a nearly universal reduction in text length, diminished lexical richness, but decreased repetitiveness.
arXiv Detail & Related papers (2024-06-17T12:03:30Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Moral consensus and divergence in partisan language use [0.0]
Polarization has increased substantially in political discourse, contributing to a widening partisan divide.
We analyzed large-scale, real-world language use in Reddit communities and in news outlets to uncover psychological dimensions along which partisan language is divided.
arXiv Detail & Related papers (2023-10-14T16:50:26Z) - Language statistics at different spatial, temporal, and grammatical
scales [48.7576911714538]
We use data from Twitter to explore the rank diversity at different scales.
The greatest changes come from variations in the grammatical scale.
As the grammatical scale grows, the rank diversity curves vary more depending on the temporal and spatial scales.
arXiv Detail & Related papers (2022-07-02T01:38:48Z) - Reaching the bubble may not be enough: news media role in online
political polarization [58.720142291102135]
A way of reducing polarization would be by distributing cross-partisan news among individuals with distinct political orientations.
This study investigates whether this holds in the context of nationwide elections in Brazil and Canada.
arXiv Detail & Related papers (2021-09-18T11:34:04Z) - Revealing Persona Biases in Dialogue Systems [64.96908171646808]
We present the first large-scale study on persona biases in dialogue systems.
We conduct analyses on personas of different social classes, sexual orientations, races, and genders.
In our studies of the Blender and DialoGPT dialogue systems, we show that the choice of personas can affect the degree of harms in generated responses.
arXiv Detail & Related papers (2021-04-18T05:44:41Z) - Exploring Polarization of Users Behavior on Twitter During the 2019
South American Protests [15.065938163384235]
We explore polarization on Twitter in a different context, namely the protest that paralyzed several countries in the South American region in 2019.
By leveraging users' endorsement of politicians' tweets and hashtag campaigns with defined stances towards the protest (for or against), we construct a weakly labeled stance dataset with millions of users.
We find empirical evidence of the "filter bubble" phenomenon during the event, as we not only show that the user bases are homogeneous in terms of stance, but the probability that a user transitions from media of different clusters is low.
arXiv Detail & Related papers (2021-04-05T07:13:18Z) - Discovering and Categorising Language Biases in Reddit [5.670038395203354]
This paper proposes a data-driven approach to automatically discover language biases encoded in the vocabulary of online discourse communities on Reddit.
We use word embeddings to transform text into high-dimensional dense vectors and capture semantic relations between words.
We successfully discover gender bias, religion bias, and ethnic bias in different Reddit communities.
arXiv Detail & Related papers (2020-08-06T16:42:10Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.