Labeled Datasets for Research on Information Operations
- URL: http://arxiv.org/abs/2411.10609v2
- Date: Wed, 20 Nov 2024 02:10:26 GMT
- Title: Labeled Datasets for Research on Information Operations
- Authors: Ozgur Can Seckin, Manita Pote, Alexander Nwala, Lake Yin, Luca Luceri, Alessandro Flammini, Filippo Menczer,
- Abstract summary: We present new labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames (control data)
The datasets will facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries.
- Score: 71.34999856621306
- License:
- Abstract: Social media platforms have become a hub for political activities and discussions, democratizing participation in these endeavors. However, they have also become an incubator for manipulation campaigns, like information operations (IOs). Some social media platforms have released datasets related to such IOs originating from different countries. However, we lack comprehensive control data that can enable the development of IO detection methods. To bridge this gap, we present new labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames (control data). The datasets will facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries. By comparing these coordinated accounts against organic ones, researchers can develop and benchmark IO detection algorithms.
Related papers
- Unraveling the Web of Disinformation: Exploring the Larger Context of State-Sponsored Influence Campaigns on Twitter [16.64763746842362]
We study 19 state-sponsored disinformation campaigns that took place on Twitter, originating from various countries.
We build a machine learning-based classifier that can correctly identify up to 94% of accounts from unseen campaigns.
We also run our system in the wild and find more accounts that could potentially belong to state-backed operations.
arXiv Detail & Related papers (2024-07-25T15:03:33Z) - The Anatomy of Conspirators: Unveiling Traits using a Comprehensive
Twitter Dataset [0.0]
We present a novel methodology for constructing a Twitter dataset that encompasses accounts engaged in conspiracy-related activities throughout the year 2022.
This comprehensive collection effort yielded a total of 15K accounts and 37M tweets extracted from their timelines.
We conduct a comparative analysis of the two groups across three dimensions: topics, profiles, and behavioral characteristics.
arXiv Detail & Related papers (2023-08-29T09:35:23Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation
for autonomous vehicles [63.20765930558542]
3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization.
We propose a new dataset, Navya 3D (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain.
It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds.
arXiv Detail & Related papers (2023-02-16T13:41:19Z) - Ranking-based Group Identification via Factorized Attention on Social
Tripartite Graph [68.08590487960475]
We propose a novel GNN-based framework named Contextualized Factorized Attention for Group identification (CFAG)
We devise tripartite graph convolution layers to aggregate information from different types of neighborhoods among users, groups, and items.
To cope with the data sparsity issue, we devise a novel propagation augmentation layer, which is based on our proposed factorized attention mechanism.
arXiv Detail & Related papers (2022-11-02T01:42:20Z) - JRDB-Act: A Large-scale Multi-modal Dataset for Spatio-temporal Action,
Social Group and Activity Detection [54.696819174421584]
We introduce JRDB-Act, a multi-modal dataset that reflects a real distribution of human daily life actions in a university campus environment.
JRDB-Act has been densely annotated with atomic actions, comprises over 2.8M action labels.
JRDB-Act comes with social group identification annotations conducive to the task of grouping individuals based on their interactions in the scene.
arXiv Detail & Related papers (2021-06-16T14:43:46Z) - Streaming Social Event Detection and Evolution Discovery in
Heterogeneous Information Networks [90.3475746663728]
Events are happening in real-world and real-time, which can be planned and organized for occasions, such as social gatherings, festival celebrations, influential meetings or sports activities.
Social media platforms generate a lot of real-time text information regarding public events with different topics.
However, mining social events is challenging because events typically exhibit heterogeneous texture and metadata are often ambiguous.
arXiv Detail & Related papers (2021-04-02T02:13:10Z) - A General Method to Find Highly Coordinating Communities in Social Media
through Inferred Interaction Links [13.264683014487376]
Political misinformation, astroturfing and organised trolling are online malicious behaviours with significant real-world effects.
We propose a novel temporal window approach that relies on account interactions and metadata alone.
It detects groups of accounts engaging in various behaviours that, in concert, come to execute different goal-based strategies.
arXiv Detail & Related papers (2021-03-05T00:48:23Z) - I-AID: Identifying Actionable Information from Disaster-related Tweets [0.0]
Social media plays a significant role in disaster management by providing valuable data about affected people, donations and help requests.
We propose I-AID, a multimodel approach to automatically categorize tweets into multi-label information types.
Our results indicate that I-AID outperforms state-of-the-art approaches in terms of weighted average F1 score by +6% and +4% on the TREC-IS dataset and COVID-19 Tweets, respectively.
arXiv Detail & Related papers (2020-08-04T19:07:50Z) - Automatic Detection of Influential Actors in Disinformation Networks [0.0]
This paper presents an end-to-end framework to automate detection of disinformation narratives, networks, and influential actors.
System detects IO accounts with 96% precision, 79% recall, and 96% area-under-the-PR-curve.
Results are corroborated with independent sources of known IO accounts from U.S. Congressional reports, investigative journalism, and IO datasets provided by Twitter.
arXiv Detail & Related papers (2020-05-21T20:15:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.