Related papers: MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

URL: http://arxiv.org/abs/2301.01123v1
Date: Tue, 3 Jan 2023 14:43:40 GMT
Title: MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark
Authors: Shuhao Shi, Kai Qiao, Jian Chen, Shuai Yang, Jie Yang, Baojie Song, Linyuan Wang, Bin Yan
Abstract summary: We propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first graph-based benchmark for account detection. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations.
Score: 14.91754326735955
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.

Related papers

RoGBot: Relationship-Oblivious Graph-based Neural Network with Contextual Knowledge for Bot Detection [3.884231159866055]
We propose a novel framework that integrates detailed textual features with enriched user metadata.<n>Our method uses transformer-based models (e.g., BERT) to extract deep semantic embeddings from tweets.<n> Experimental results on the Cresci-15, Cresci-17, and PAN 2019 datasets demonstrate the robustness of our approach.
arXiv Detail & Related papers (2025-10-25T05:14:58Z)
Mind the Gap! Static and Interactive Evaluations of Large Audio Models [55.87220295533817]
Large Audio Models (LAMs) are designed to power voice-native experiences. This study introduces an interactive approach to evaluate LAMs and collect 7,500 LAM interactions from 484 participants.
arXiv Detail & Related papers (2025-02-21T20:29:02Z)
Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly. We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments. Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z)
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection [69.41274756177336]
Large Language Models (LLMs) have brought an unprecedented surge in machine-generated text (MGT) across diverse channels. This raises legitimate concerns about its potential misuse and societal implications. We introduce a new benchmark based on a multilingual, multi-domain, and multi-generator corpus of MGTs -- M4GT-Bench.
arXiv Detail & Related papers (2024-02-17T02:50:33Z)
SeGA: Preference-Aware Self-Contrastive Learning with Prompts for Anomalous User Detection on Twitter [14.483830120541894]
We propose SeGA, preference-aware self-contrastive learning for anomalous user detection. SeGA uses large language models to summarize user preferences via posts. We empirically validate the effectiveness of the model design and pre-training strategies.
arXiv Detail & Related papers (2023-12-17T05:35:28Z)
Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models. We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z)
Towards Detecting Inauthentic Coordination in Twitter Likes Data [0.0]
Users customarily take engagement metrics such as likes as a neutral proxy for quality and authority. This incentivizes like manipulation to influence public opinion through *coordinated inauthentic behavior* (CIB) CIB targeted at likes is largely unstudied as collecting suitable data about users' liking behavior is non-trivial. This paper contributes a scripted algorithm to collect suitable liking data from Twitter and a collected 30 day dataset of liking data from the Danish political Twittersphere #dkpol.
arXiv Detail & Related papers (2023-05-12T11:24:26Z)
DoubleH: Twitter User Stance Detection via Bipartite Graph Neural Networks [9.350629400940493]
We crawl a large-scale dataset of the 2020 US presidential election and automatically label all users by manually tagged hashtags. We propose a bipartite graph neural network model, DoubleH, which aims to better utilize homogeneous and heterogeneous information in user stance detection tasks.
arXiv Detail & Related papers (2023-01-20T19:20:10Z)
Self-supervised Graph Representation Learning for Black Market Account Detection [62.03978210281426]
Black market accounts (BMAs) are not directly involved in frauds and are more difficult to detect. This paper illustrates our BMA detection system SGRL (Self-supervised Graph Learning) used in WeChat, a representative MMMA with over a billion users. We deploy SGRL in the online environment to detect BMAs on the billion-scale WeChat graph, and it exceeds the alternative by 7.27% on the online evaluation measure.
arXiv Detail & Related papers (2022-12-06T00:42:00Z)
Learning Gait Representation from Massive Unlabelled Walking Videos: A Benchmark [11.948554539954673]
This paper proposes a large-scale self-supervised benchmark for gait recognition with contrastive learning. We collect a large-scale unlabelled gait dataset GaitLU-1M consisting of 1.02M walking sequences. We evaluate the pre-trained model on four widely-used gait benchmarks, CASIA-B, OU-M, GREW and Gait3D with or without transfer learning.
arXiv Detail & Related papers (2022-06-28T12:33:42Z)
Gait Recognition in the Wild: A Large-scale Benchmark and NAS-based Baseline [95.88825497452716]
Gait benchmarks empower the research community to train and evaluate high-performance gait recognition systems. GREW is the first large-scale dataset for gait recognition in the wild. SPOSGait is the first NAS-based gait recognition model.
arXiv Detail & Related papers (2022-05-05T14:57:39Z)
Detection Recovery in Online Multi-Object Tracking with Sparse Graph Tracker [17.00871668925939]
In existing joint detection and tracking methods, pairwise relational features are used to match previous tracklets to current detections. We present Sparse Graph Tracker (SGT), a novel online graph tracker using higher-order relational features which are more discriminative. In the MOT16/17/20 and HiEve Challenge, SGT outperforms the state-of-the-art trackers with real-time inference speed.
arXiv Detail & Related papers (2022-05-02T15:09:36Z)
ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images. We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs. Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.