LMBot: Distilling Graph Knowledge into Language Model for Graph-less
Deployment in Twitter Bot Detection
- URL: http://arxiv.org/abs/2306.17408v3
- Date: Wed, 3 Jan 2024 05:00:00 GMT
- Title: LMBot: Distilling Graph Knowledge into Language Model for Graph-less
Deployment in Twitter Bot Detection
- Authors: Zijian Cai, Zhaoxuan Tan, Zhenyu Lei, Zifeng Zhu, Hongrui Wang,
Qinghua Zheng, Minnan Luo
- Abstract summary: We propose a novel bot detection framework LMBot that distills the knowledge of graph neural networks (GNNs) into language models (LMs)
For graph-based datasets, the output of LMs provides input features for the GNN, enabling it to optimize for bot detection and distill knowledge back to the LM in an iterative, mutually enhancing process.
Our experiments demonstrate that LMBot achieves state-of-the-art performance on four Twitter bot detection benchmarks.
- Score: 41.043975659303435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As malicious actors employ increasingly advanced and widespread bots to
disseminate misinformation and manipulate public opinion, the detection of
Twitter bots has become a crucial task. Though graph-based Twitter bot
detection methods achieve state-of-the-art performance, we find that their
inference depends on the neighbor users multi-hop away from the targets, and
fetching neighbors is time-consuming and may introduce bias. At the same time,
we find that after finetuning on Twitter bot detection, pretrained language
models achieve competitive performance and do not require a graph structure
during deployment. Inspired by this finding, we propose a novel bot detection
framework LMBot that distills the knowledge of graph neural networks (GNNs)
into language models (LMs) for graph-less deployment in Twitter bot detection
to combat the challenge of data dependency. Moreover, LMBot is compatible with
graph-based and graph-less datasets. Specifically, we first represent each user
as a textual sequence and feed them into the LM for domain adaptation. For
graph-based datasets, the output of LMs provides input features for the GNN,
enabling it to optimize for bot detection and distill knowledge back to the LM
in an iterative, mutually enhancing process. Armed with the LM, we can perform
graph-less inference, which resolves the graph data dependency and sampling
bias issues. For datasets without graph structure, we simply replace the GNN
with an MLP, which has also shown strong performance. Our experiments
demonstrate that LMBot achieves state-of-the-art performance on four Twitter
bot detection benchmarks. Extensive studies also show that LMBot is more
robust, versatile, and efficient compared to graph-based Twitter bot detection
methods.
Related papers
- LGB: Language Model and Graph Neural Network-Driven Social Bot Detection [43.92522451274129]
Malicious social bots achieve their malicious purposes by spreading misinformation and inciting social public opinion.
We propose a novel social bot detection framework LGB, which consists of two main components: language model (LM) and graph neural network (GNN)
Experiments on two real-world datasets demonstrate that LGB consistently outperforms state-of-the-art baseline models by up to 10.95%.
arXiv Detail & Related papers (2024-06-13T02:47:38Z) - Parameter-Efficient Tuning Large Language Models for Graph Representation Learning [62.26278815157628]
We introduce Graph-aware.
Efficient Fine-Tuning - GPEFT, a novel approach for efficient graph representation learning.
We use a graph neural network (GNN) to encode structural information from neighboring nodes into a graph prompt.
We validate our approach through comprehensive experiments conducted on 8 different text-rich graphs, observing an average improvement of 2% in hit@1 and Mean Reciprocal Rank (MRR) in link prediction evaluations.
arXiv Detail & Related papers (2024-04-28T18:36:59Z) - Efficient End-to-end Language Model Fine-tuning on Graphs [21.23522552579571]
Learning from Text-Attributed Graphs (TAGs) has attracted significant attention due to its wide range of real-world applications.
We introduce LEADING, a novel and efficient approach for end-to-end fine-tuning of language models on TAGs.
Our proposed approach demonstrates superior performance, achieving state-of-the-art (SOTA) results on the ogbn-arxiv leaderboard.
arXiv Detail & Related papers (2023-12-07T22:35:16Z) - Multimodal Detection of Bots on X (Twitter) using Transformers [6.390468088226495]
We propose a novel method for detecting bots in social media.
We use only the user description field and images of three channels.
Experiments conducted on the Cresci'17 and TwiBot-20 datasets demonstrate valuable advantages of our introduced approaches.
arXiv Detail & Related papers (2023-08-28T10:51:11Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - Muti-scale Graph Neural Network with Signed-attention for Social Bot
Detection: A Frequency Perspective [10.089319405788277]
The presence of a large number of bots on social media has adverse effects.
The graph neural network (GNN) can effectively leverage the social relationships between users and achieve excellent results in detecting bots.
This paper proposes a Multi-scale with Signed-attention Graph Filter for social bot detection called MSGS.
arXiv Detail & Related papers (2023-07-05T00:40:19Z) - Model Inversion Attacks against Graph Neural Networks [65.35955643325038]
We study model inversion attacks against Graph Neural Networks (GNNs)
In this paper, we present GraphMI to infer the private training graph data.
Our experimental results show that such defenses are not sufficiently effective and call for more advanced defenses against privacy attacks.
arXiv Detail & Related papers (2022-09-16T09:13:43Z) - TwiBot-22: Towards Graph-Based Twitter Bot Detection [39.359825215347655]
TwiBot-22 is a graph-based Twitter bot detection benchmark that presents the largest dataset to date.
We re-implement 35 representative Twitter bot detection baselines and evaluate them on 9 datasets, including TwiBot-22.
To facilitate further research, we consolidate all implemented codes and datasets into the TwiBot-22 evaluation framework.
arXiv Detail & Related papers (2022-06-09T15:23:37Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Detection of Novel Social Bots by Ensembles of Specialized Classifiers [60.63582690037839]
Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion.
We show that different types of bots are characterized by different behavioral features.
We propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule.
arXiv Detail & Related papers (2020-06-11T22:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.