Related papers: What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection

URL: http://arxiv.org/abs/2402.00371v2
Date: Thu, 4 Jul 2024 23:37:40 GMT
Title: What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection
Authors: Shangbin Feng, Herun Wan, Ningnan Wang, Zhaoxuan Tan, Minnan Luo, Yulia Tsvetkov,
Abstract summary: We investigate the opportunities and risks of large language models in social bot detection. We propose a mixture-of-heterogeneous-experts framework to divide and conquer diverse user information modalities. Experiments show that instruction tuning on 1,000 annotated examples produces specialized LLMs that outperform state-of-the-art baselines.
Score: 48.572932773403274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Social media bot detection has always been an arms race between advancements in machine learning bot detectors and adversarial bot strategies to evade detection. In this work, we bring the arms race to the next level by investigating the opportunities and risks of state-of-the-art large language models (LLMs) in social bot detection. To investigate the opportunities, we design novel LLM-based bot detectors by proposing a mixture-of-heterogeneous-experts framework to divide and conquer diverse user information modalities. To illuminate the risks, we explore the possibility of LLM-guided manipulation of user textual and structured information to evade detection. Extensive experiments with three LLMs on two datasets demonstrate that instruction tuning on merely 1,000 annotated examples produces specialized LLMs that outperform state-of-the-art baselines by up to 9.1% on both datasets, while LLM-guided manipulation strategies could significantly bring down the performance of existing bot detectors by up to 29.6% and harm the calibration and reliability of bot detection systems.

Related papers

Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection [1.2571354974258824]
We propose a new strategy for taking advantage of Large Language Models (LLMs) in provenance-based threat detection. LLMs offer additional details in provenance data interpretation, leveraging their knowledge of system calls, software identity, and high-level understanding of application execution context. In our evaluation, supervised threat detection achieves a precision of 99.0%, and semi-supervised anomaly detection attains a precision of 96.9%.
arXiv Detail & Related papers (2025-03-24T03:51:09Z)
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios [38.952481877244644]
We present a new benchmark, DetectRL, highlighting that even state-of-the-art (SOTA) detection techniques still underperformed in this task. Our development of DetectRL reveals the strengths and limitations of current SOTA detectors. We believe DetectRL could serve as an effective benchmark for assessing detectors in real-world scenarios.
arXiv Detail & Related papers (2024-10-31T09:01:25Z)
Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors [31.18762591875725]
We introduce a proxy-attack strategy that effortlessly compromises large language models (LLMs) Our method attacks the source model by leveraging a reinforcement learning (RL) fine-tuned humanized small language model (SLM) in the decoding phase. Our findings show that the proxy-attack strategy effectively deceives the leading detectors, resulting in an average AUROC drop of 70.4% across multiple datasets.
arXiv Detail & Related papers (2024-10-25T00:35:00Z)
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement [51.601916604301685]
Large language models (LLMs) generate content that can undermine trust in online discourse. Current methods often focus on binary classification, failing to address the complexities of real-world scenarios like human-AI collaboration. To move beyond binary classification and address these challenges, we propose a new paradigm for detecting LLM-generated content.
arXiv Detail & Related papers (2024-10-18T08:14:10Z)
Intent Detection in the Age of LLMs [3.755082744150185]
Intent detection is a critical component of task-oriented dialogue systems (TODS) Traditional approaches relied on computationally efficient supervised sentence transformer encoder models. The emergence of generative large language models (LLMs) with intrinsic world knowledge presents new opportunities to address these challenges.
arXiv Detail & Related papers (2024-10-02T15:01:55Z)
On the Vulnerability of LLM/VLM-Controlled Robotics [54.57914943017522]
We highlight vulnerabilities in robotic systems integrating large language models (LLMs) and vision-language models (VLMs) due to input modality sensitivities. Our results show that simple input perturbations reduce task execution success rates by 22.2% and 14.6% in two representative LLM/VLM-controlled robotic systems.
arXiv Detail & Related papers (2024-02-15T22:01:45Z)
BotSSCL: Social Bot Detection with Self-Supervised Contrastive Learning [6.317191658158437]
We propose a novel framework for social Bot detection with Self-Supervised Contrastive Learning (BotSSCL) BotSSCL uses contrastive learning to distinguish between social bots and humans in the embedding space to improve linear separability. We demonstrate BotSSCL's robustness against adversarial attempts to manipulate bot accounts to evade detection.
arXiv Detail & Related papers (2024-02-06T06:13:13Z)
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text [98.28130949052313]
A score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. We propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data.
arXiv Detail & Related papers (2024-01-22T16:09:47Z)
Detecting Phishing Sites Using ChatGPT [2.3999111269325266]
We propose a novel system called ChatPhishDetector that utilizes Large Language Models (LLMs) to detect phishing sites. Our system involves leveraging a web crawler to gather information from websites, generating prompts for LLMs based on the crawled data, and then retrieving the detection results from the responses generated by the LLMs. The experimental results using GPT-4V demonstrated outstanding performance, with a precision of 98.7% and a recall of 99.6%, outperforming the detection results of other LLMs and existing systems.
arXiv Detail & Related papers (2023-06-09T11:30:08Z)
Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. Recent works have proposed algorithms to detect LLM-generated text and protect LLMs. We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z)
Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques. In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
Detection of Novel Social Bots by Ensembles of Specialized Classifiers [60.63582690037839]
Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion. We show that different types of bots are characterized by different behavioral features. We propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule.
arXiv Detail & Related papers (2020-06-11T22:59:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.