Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media
- URL: http://arxiv.org/abs/2406.19867v1
- Date: Fri, 28 Jun 2024 12:13:29 GMT
- Title: Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media
- Authors: Gabriele Di Bona, Emma Fraxanet, Björn Komander, Andrea Lo Sasso, Virginia Morini, Antoine Vendeville, Max Falkenberg, Alessandro Galeazzi,
- Abstract summary: We study the structural polarization of the Polish political debate on Twitter over a 24-hour period.
Large samples can be representative of the whole political discussion on a platform, but small samples consistently fail to accurately reflect the true structure of polarization online.
- Score: 34.192291430580454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Following recent policy changes by X (Twitter) and other social media platforms, user interaction data has become increasingly difficult to access. These restrictions are impeding robust research pertaining to social and political phenomena online, which is critical due to the profound impact social media platforms may have on our societies. Here, we investigate the reliability of polarization measures obtained from different samples of social media data by studying the structural polarization of the Polish political debate on Twitter over a 24-hour period. First, we show that the political discussion on Twitter is only a small subset of the wider Twitter discussion. Second, we find that large samples can be representative of the whole political discussion on a platform, but small samples consistently fail to accurately reflect the true structure of polarization online. Finally, we demonstrate that keyword-based samples can be representative if keywords are selected with great care, but that poorly selected keywords can result in substantial political bias in the sampled data. Our findings demonstrate that it is not possible to measure polarization in a reliable way with small, sampled datasets, highlighting why the current lack of research data is so problematic, and providing insight into the practical implementation of the European Union's Digital Service Act which aims to improve researchers' access to social media data.
Related papers
- On the Use of Proxies in Political Ad Targeting [49.61009579554272]
We show that major political advertisers circumvented mitigations by targeting proxy attributes.
Our findings have crucial implications for the ongoing discussion on the regulation of political advertising.
arXiv Detail & Related papers (2024-10-18T17:15:13Z) - Can LLMs Help Predict Elections? (Counter)Evidence from the World's Largest Democracy [3.0915192911449796]
The study of how social media affects the formation of public opinion and its influence on political results has been a popular field of inquiry.
We introduce a new method: harnessing the capabilities of Large Language Models (LLMs) to examine social media data and forecast election outcomes.
arXiv Detail & Related papers (2024-05-13T15:13:23Z) - Modeling Political Orientation of Social Media Posts: An Extended
Analysis [0.0]
Developing machine learning models to characterize political polarization on online social media presents significant challenges.
These challenges mainly stem from various factors such as the lack of annotated data, presence of noise in social media datasets, and the sheer volume of data.
We introduce two methods that leverage on news media bias and post content to label social media posts.
We demonstrate that current machine learning models can exhibit improved performance in predicting political orientation of social media posts.
arXiv Detail & Related papers (2023-11-21T03:34:20Z) - Quantitative Analysis of Forecasting Models:In the Aspect of Online
Political Bias [0.0]
We propose a approach to classify social media posts into five distinct political leaning categories.
Our approach involves utilizing existing time series forecasting models on two social media datasets with different political ideologies.
arXiv Detail & Related papers (2023-09-11T16:17:24Z) - Unveiling the Hidden Agenda: Biases in News Reporting and Consumption [59.55900146668931]
We build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases.
We found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions.
Analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions.
arXiv Detail & Related papers (2023-01-14T18:58:42Z) - Design and analysis of tweet-based election models for the 2021 Mexican
legislative election [55.41644538483948]
We use a dataset of 15 million election-related tweets in the six months preceding election day.
We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods.
arXiv Detail & Related papers (2023-01-02T12:40:05Z) - Fast Few shot Self-attentive Semi-supervised Political Inclination
Prediction [12.472629584751509]
It is increasingly common now for policymakers/journalists to create online polls on social media to understand the political leanings of people in specific locations.
We introduce a self-attentive semi-supervised framework for political inclination detection to further that objective.
We found that the model is highly efficient even in resource-constrained settings.
arXiv Detail & Related papers (2022-09-21T12:07:16Z) - Reaching the bubble may not be enough: news media role in online
political polarization [58.720142291102135]
A way of reducing polarization would be by distributing cross-partisan news among individuals with distinct political orientations.
This study investigates whether this holds in the context of nationwide elections in Brazil and Canada.
arXiv Detail & Related papers (2021-09-18T11:34:04Z) - Two-Faced Humans on Twitter and Facebook: Harvesting Social Multimedia
for Human Personality Profiling [74.83957286553924]
We infer the Myers-Briggs Personality Type indicators by applying a novel multi-view fusion framework, called "PERS"
Our experimental results demonstrate the PERS's ability to learn from multi-view data for personality profiling by efficiently leveraging on the significantly different data arriving from diverse social multimedia sources.
arXiv Detail & Related papers (2021-06-20T10:48:49Z) - TIMME: Twitter Ideology-detection via Multi-task Multi-relational
Embedding [26.074367752142198]
We aim at solving the problem of predicting people's ideology, or political tendency.
We estimate it by using Twitter data, and formalize it as a classification problem.
arXiv Detail & Related papers (2020-06-02T00:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.