TwiBot-22: Towards Graph-Based Twitter Bot Detection
- URL: http://arxiv.org/abs/2206.04564v2
- Date: Sun, 12 Jun 2022 09:05:30 GMT
- Title: TwiBot-22: Towards Graph-Based Twitter Bot Detection
- Authors: Shangbin Feng, Zhaoxuan Tan, Herun Wan, Ningnan Wang, Zilong Chen,
Binchi Zhang, Qinghua Zheng, Wenqian Zhang, Zhenyu Lei, Shujie Yang, Xinshun
Feng, Qingyue Zhang, Hongrui Wang, Yuhan Liu, Yuyang Bai, Heng Wang, Zijian
Cai, Yanbo Wang, Lijing Zheng, Zihan Ma, Jundong Li, Minnan Luo
- Abstract summary: TwiBot-22 is a graph-based Twitter bot detection benchmark that presents the largest dataset to date.
We re-implement 35 representative Twitter bot detection baselines and evaluate them on 9 datasets, including TwiBot-22.
To facilitate further research, we consolidate all implemented codes and datasets into the TwiBot-22 evaluation framework.
- Score: 39.359825215347655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Twitter bot detection has become an increasingly important task to combat
misinformation, facilitate social media moderation, and preserve the integrity
of the online discourse. State-of-the-art bot detection methods generally
leverage the graph structure of the Twitter network, and they exhibit promising
performance when confronting novel Twitter bots that traditional methods fail
to detect. However, very few of the existing Twitter bot detection datasets are
graph-based, and even these few graph-based datasets suffer from limited
dataset scale, incomplete graph structure, as well as low annotation quality.
In fact, the lack of a large-scale graph-based Twitter bot detection benchmark
that addresses these issues has seriously hindered the development and
evaluation of novel graph-based bot detection approaches. In this paper, we
propose TwiBot-22, a comprehensive graph-based Twitter bot detection benchmark
that presents the largest dataset to date, provides diversified entities and
relations on the Twitter network, and has considerably better annotation
quality than existing datasets. In addition, we re-implement 35 representative
Twitter bot detection baselines and evaluate them on 9 datasets, including
TwiBot-22, to promote a fair comparison of model performance and a holistic
understanding of research progress. To facilitate further research, we
consolidate all implemented codes and datasets into the TwiBot-22 evaluation
framework, where researchers could consistently evaluate new models and
datasets. The TwiBot-22 Twitter bot detection benchmark and evaluation
framework are publicly available at https://twibot22.github.io/
Related papers
- My Brother Helps Me: Node Injection Based Adversarial Attack on Social Bot Detection [69.99192868521564]
Social platforms such as Twitter are under siege from a multitude of fraudulent users.
Due to the structure of social networks, the majority of methods are based on the graph neural network(GNN), which is susceptible to attacks.
We propose a node injection-based adversarial attack method designed to deceive bot detection models.
arXiv Detail & Related papers (2023-10-11T03:09:48Z) - LMBot: Distilling Graph Knowledge into Language Model for Graph-less
Deployment in Twitter Bot Detection [41.043975659303435]
We propose a novel bot detection framework LMBot that distills the knowledge of graph neural networks (GNNs) into language models (LMs)
For graph-based datasets, the output of LMs provides input features for the GNN, enabling it to optimize for bot detection and distill knowledge back to the LM in an iterative, mutually enhancing process.
Our experiments demonstrate that LMBot achieves state-of-the-art performance on four Twitter bot detection benchmarks.
arXiv Detail & Related papers (2023-06-30T05:50:26Z) - BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline [47.61306219245444]
Twitter has become a target for bots and fake accounts, resulting in the spread of false information and manipulation.
This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges correlated with machine learning model development.
We develop a comprehensive bot detection model named BotArtist, based on user profile features.
arXiv Detail & Related papers (2023-05-31T09:12:35Z) - Machine Learning-based Automatic Annotation and Detection of COVID-19
Fake News [8.020736472947581]
COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus.
Existing work neglects the presence of bots that act as a catalyst in the spread.
We propose an automated approach for labeling data using verified fact-checked statements on a Twitter dataset.
arXiv Detail & Related papers (2022-09-07T13:55:59Z) - BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic
Consistency [22.52777462831911]
We propose a novel model named BIC that makes the text and graph modalities deeply interactive and detects tweet semantic inconsistency.
BIC contains a semantic consistency detection module to learn semantic consistency information from tweets.
Our framework outperforms competitive baselines on a comprehensive Twitter bot benchmark.
arXiv Detail & Related papers (2022-08-17T14:34:40Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Detection of Novel Social Bots by Ensembles of Specialized Classifiers [60.63582690037839]
Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion.
We show that different types of bots are characterized by different behavioral features.
We propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule.
arXiv Detail & Related papers (2020-06-11T22:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.