Large Engagement Networks for Classifying Coordinated Campaigns and Organic Twitter Trends
- URL: http://arxiv.org/abs/2503.00599v2
- Date: Fri, 28 Mar 2025 14:54:05 GMT
- Title: Large Engagement Networks for Classifying Coordinated Campaigns and Organic Twitter Trends
- Authors: Atul Anand Gopalakrishnan, Jakir Hossain, Tugrulcan Elmas, Ahmet Erdem Sariyuce,
- Abstract summary: Social media users and inauthentic accounts may coordinate in promoting their topics.<n>It is challenging to predict if a topic is organic or a coordinated campaign due to the lack of reliable ground truth.<n>In this paper, we create such ground truth by detecting the campaigns promoted by ephemeral astroturfing attacks.
- Score: 1.3595147353266148
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Social media users and inauthentic accounts, such as bots, may coordinate in promoting their topics. Such topics may give the impression that they are organically popular among the public, even though they are astroturfing campaigns that are centrally managed. It is challenging to predict if a topic is organic or a coordinated campaign due to the lack of reliable ground truth. In this paper, we create such ground truth by detecting the campaigns promoted by ephemeral astroturfing attacks. These attacks push any topic to Twitter's (X) trends list by employing bots that tweet in a coordinated manner in a short period and then immediately delete their tweets. We manually curate a dataset of organic Twitter trends. We then create engagement networks out of these datasets which can serve as a challenging testbed for graph classification task to distinguish between campaigns and organic trends. Engagement networks consist of users as nodes and engagements as edges (retweets, replies, and quotes) between users. We release the engagement networks for 179 campaigns and 135 non-campaigns, and also provide finer-grain labels to characterize the type of the campaigns and non-campaigns. Our dataset, LEN (Large Engagement Networks), is available in the URL below. In comparison to traditional graph classification datasets, which are small with tens of nodes and hundreds of edges at most, graphs in LEN are larger. The average graph in LEN has ~11K nodes and ~23K edges. We show that state-of-the-art GNN methods give only mediocre results for campaign vs. non-campaign and campaign type classification on LEN. LEN offers a unique and challenging playfield for the graph classification problem. We believe that LEN will help advance the frontiers of graph classification techniques on large networks and also provide an interesting use case in terms of distinguishing coordinated campaigns and organic trends.
Related papers
- Density-aware Walks for Coordinated Campaign Detection [1.3595147353266148]
Coordinated campaigns frequently exploit social media platforms by artificially amplifying topics, making inauthentic trends appear organic, and misleading users into engagement.<n>Our work focuses on detecting coordinated campaigns by modeling the problem as a graph classification task.<n>We leverage the recently introduced Large Engagement Networks (LEN) dataset, which contains over 300 networks capturing engagement patterns from both fake and authentic trends on Twitter prior to the 2023 Turkish elections.
arXiv Detail & Related papers (2025-06-16T18:44:38Z) - Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? [62.12375949429938]
We propose a multi-modal prompt learning paradigm to adapt pre-trained Graph Neural Networks to downstream tasks and data.<n>Our new paradigm embeds the graphs directly in the same space as the Large Language Models (LLMs) by learning both graph prompts and text prompts simultaneously.<n>We build the first CLIP-style zero-shot classification prototype that can generalize GNNs to unseen classes with extremely weak text supervision.
arXiv Detail & Related papers (2024-12-11T08:03:35Z) - Labeled Datasets for Research on Information Operations [71.34999856621306]
We present new labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames (control data)
The datasets will facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries.
arXiv Detail & Related papers (2024-11-15T22:15:01Z) - Unraveling the Web of Disinformation: Exploring the Larger Context of State-Sponsored Influence Campaigns on Twitter [16.64763746842362]
We study 19 state-sponsored disinformation campaigns that took place on Twitter, originating from various countries.
We build a machine learning-based classifier that can correctly identify up to 94% of accounts from unseen campaigns.
We also run our system in the wild and find more accounts that could potentially belong to state-backed operations.
arXiv Detail & Related papers (2024-07-25T15:03:33Z) - Domain-adaptive Message Passing Graph Neural Network [67.35534058138387]
Cross-network node classification (CNNC) aims to classify nodes in a label-deficient target network by transferring the knowledge from a source network with abundant labels.
We propose a domain-adaptive message passing graph neural network (DM-GNN), which integrates graph neural network (GNN) with conditional adversarial domain adaptation.
arXiv Detail & Related papers (2023-08-31T05:26:08Z) - DoubleH: Twitter User Stance Detection via Bipartite Graph Neural
Networks [9.350629400940493]
We crawl a large-scale dataset of the 2020 US presidential election and automatically label all users by manually tagged hashtags.
We propose a bipartite graph neural network model, DoubleH, which aims to better utilize homogeneous and heterogeneous information in user stance detection tasks.
arXiv Detail & Related papers (2023-01-20T19:20:10Z) - Design and analysis of tweet-based election models for the 2021 Mexican
legislative election [55.41644538483948]
We use a dataset of 15 million election-related tweets in the six months preceding election day.
We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods.
arXiv Detail & Related papers (2023-01-02T12:40:05Z) - Contextual Bandits for Advertising Campaigns: A Diffusion-Model
Independent Approach (Extended Version) [73.59962178534361]
We study an influence problem in which little is assumed to be known about the diffusion network or about the model that determines how information may propagate.
In this setting, an explore-exploit approach could be used to learn the key underlying diffusion parameters, while running the campaign.
We describe and compare two methods of contextual multi-armed bandits, with upper-confidence bounds on the remaining potential of influencers.
arXiv Detail & Related papers (2022-01-13T22:06:10Z) - The Spread of Propaganda by Coordinated Communities on Social Media [43.2770127582382]
We analyze the spread of propaganda and its interplay with coordinated behavior on a large Twitter dataset about the 2019 UK general election.
The combination of the use of propaganda and coordinated behavior allows us to uncover the authenticity and harmfulness of the different communities.
arXiv Detail & Related papers (2021-09-27T13:39:10Z) - Towards A Sentiment Analyzer for Low-Resource Languages [0.0]
This research aims to analyse a sentiment of the users towards a particular trending topic that has been actively and massively discussed at that time.
We use the hashtag textit#kpujangancurang that was the trending topic during the Indonesia presidential election in 2019.
This research utilizes rapid miner tool to generate the twitter data and comparing Naive Bayes, K-Nearest Neighbor, Decision Tree, and Multi-Layer Perceptron classification methods to classify the sentiment of the twitter data.
arXiv Detail & Related papers (2020-11-12T13:50:00Z) - Adversarial Attack on Large Scale Graph [58.741365277995044]
Recent studies have shown that graph neural networks (GNNs) are vulnerable against perturbations due to lack of robustness.
Currently, most works on attacking GNNs are mainly using gradient information to guide the attack and achieve outstanding performance.
We argue that the main reason is that they have to use the whole graph for attacks, resulting in the increasing time and space complexity as the data scale grows.
We present a practical metric named Degree Assortativity Change (DAC) to measure the impacts of adversarial attacks on graph data.
arXiv Detail & Related papers (2020-09-08T02:17:55Z) - Efficient, Direct, and Restricted Black-Box Graph Evasion Attacks to
Any-Layer Graph Neural Networks via Influence Function [62.89388227354517]
Graph neural network (GNN), the mainstream method to learn on graph data, is vulnerable to graph evasion attacks.
Existing work has at least one of the following drawbacks: 1) limited to directly attack two-layer GNNs; 2) inefficient; and 3) impractical, as they need to know full or part of GNN model parameters.
We propose an influence-based emphefficient, direct, and restricted black-box evasion attack to emphany-layer GNNs.
arXiv Detail & Related papers (2020-09-01T03:24:51Z) - Fine-Grained Crowd Counting [59.63412475367119]
Current crowd counting algorithms are only concerned with the number of people in an image.
We propose fine-grained crowd counting, which differentiates a crowd into categories based on the low-level behavior attributes of the individuals.
arXiv Detail & Related papers (2020-07-13T01:31:12Z) - TIMME: Twitter Ideology-detection via Multi-task Multi-relational
Embedding [26.074367752142198]
We aim at solving the problem of predicting people's ideology, or political tendency.
We estimate it by using Twitter data, and formalize it as a classification problem.
arXiv Detail & Related papers (2020-06-02T00:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.