Does Twitter know your political views? POLiTweets dataset and
semi-automatic method for political leaning discovery
- URL: http://arxiv.org/abs/2207.07586v1
- Date: Tue, 14 Jun 2022 10:28:23 GMT
- Title: Does Twitter know your political views? POLiTweets dataset and
semi-automatic method for political leaning discovery
- Authors: Joanna Baran, Micha{\l} Kajstura, Maciej Zi\'o{\l}kowski, Krzysztof
Rajda
- Abstract summary: POLiTweets is the first publicly open Polish dataset for political affiliation discovery in a multiparty setup.
It consists of over 147k tweets from almost 10k Polish-writing users annotatedally and almost 40k tweets from 166 users annotated manually as a test set.
We used our data to study the aspects of domain shift in the context of topics and the type of content writers - ordinary citizens vs. professional politicians.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Every day, the world is flooded by millions of messages and statements posted
on Twitter or Facebook. Social media platforms try to protect users' personal
data, but there still is a real risk of misuse, including elections
manipulation. Did you know, that only 13 posts addressing important or
controversial topics for society are enough to predict one's political
affiliation with a 0.85 F1-score? To examine this phenomenon, we created a
novel universal method of semi-automated political leaning discovery. It relies
on a heuristical data annotation procedure, which was evaluated to achieve 0.95
agreement with human annotators (counted as an accuracy metric). We also
present POLiTweets - the first publicly open Polish dataset for political
affiliation discovery in a multi-party setup, consisting of over 147k tweets
from almost 10k Polish-writing users annotated heuristically and almost 40k
tweets from 166 users annotated manually as a test set. We used our data to
study the aspects of domain shift in the context of topics and the type of
content writers - ordinary citizens vs. professional politicians.
Related papers
- On the Use of Proxies in Political Ad Targeting [49.61009579554272]
We show that major political advertisers circumvented mitigations by targeting proxy attributes.
Our findings have crucial implications for the ongoing discussion on the regulation of political advertising.
arXiv Detail & Related papers (2024-10-18T17:15:13Z) - Electoral Agitation Data Set: The Use Case of the Polish Election [3.671887117122512]
We present the first publicly open data set for detecting electoral agitation in the Polish language.
It contains 6,112 human-annotated tweets tagged with four legally conditioned categories.
The newly created data set was used to fine-tune a Polish Language Model called HerBERT.
arXiv Detail & Related papers (2023-07-13T18:14:43Z) - Design and analysis of tweet-based election models for the 2021 Mexican
legislative election [55.41644538483948]
We use a dataset of 15 million election-related tweets in the six months preceding election day.
We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods.
arXiv Detail & Related papers (2023-01-02T12:40:05Z) - Top Gear or Black Mirror: Inferring Political Leaning From Non-Political
Content [8.435739379764408]
Polarization and echo chambers are often studied in the context of explicitly political events such as elections.
Political polarization in non-political contexts is often unknown.
Political leaning is known to correlate with many lifestyle choices leading to stereotypes such as the "latte-drinking liberal"
arXiv Detail & Related papers (2022-08-11T06:41:23Z) - Tweets2Stance: Users stance detection exploiting Zero-Shot Learning
Algorithms on Tweets [0.06372261626436675]
The aim of the study is to predict the stance of a Party p in regard to each statement s exploiting what the Twitter Party account wrote on Twitter.
Results obtained from multiple experiments show that Tweets2Stance can correctly predict the stance with a general minimum MAE of 1.13, which is a great achievement considering the task complexity.
arXiv Detail & Related papers (2022-04-22T14:00:11Z) - Political Communities on Twitter: Case Study of the 2022 French
Presidential Election [14.783829037950984]
We aim to identify political communities formed on Twitter during the 2022 French presidential election.
We create a large-scale Twitter dataset containing 1.2 million users and 62.6 million tweets that mention keywords relevant to the election.
We perform community detection on a retweet graph of users and propose an in-depth analysis of the stance of each community.
arXiv Detail & Related papers (2022-04-15T12:18:16Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Reaching the bubble may not be enough: news media role in online
political polarization [58.720142291102135]
A way of reducing polarization would be by distributing cross-partisan news among individuals with distinct political orientations.
This study investigates whether this holds in the context of nationwide elections in Brazil and Canada.
arXiv Detail & Related papers (2021-09-18T11:34:04Z) - Political Posters Identification with Appearance-Text Fusion [49.55696202606098]
We propose a method that efficiently utilizes appearance features and text vectors to accurately classify political posters.
The majority of this work focuses on political posters that are designed to serve as a promotion of a certain political event.
arXiv Detail & Related papers (2020-12-19T16:14:51Z) - Political audience diversity and news reliability in algorithmic ranking [54.23273310155137]
We propose using the political diversity of a website's audience as a quality signal.
Using news source reliability ratings from domain experts and web browsing data from a diverse sample of 6,890 U.S. citizens, we first show that websites with more extreme and less politically diverse audiences have lower journalistic standards.
arXiv Detail & Related papers (2020-07-16T02:13:55Z) - Political Advertising Dataset: the use case of the Polish 2020
Presidential Elections [4.560033258611709]
We present the first publicly open dataset for detecting specific text chunks and categories of political advertising in the Polish language.
It contains 1,705 human-annotated tweets tagged with nine categories, which constitute campaigning under Polish electoral law.
arXiv Detail & Related papers (2020-06-17T23:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.