How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis
- URL: http://arxiv.org/abs/2408.09613v2
- Date: Thu, 17 Apr 2025 09:37:38 GMT
- Title: How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis
- Authors: Herun Wan, Minnan Luo, Zihan Ma, Guang Dai, Xiang Zhao,
- Abstract summary: This paper is the first to explore the interplay between social bots and misinformation on the Sina Weibo platform.<n>From the misinformation perspective, this dataset is multimodal, containing 11,393 pieces of misinformation and 16,416 pieces of real information.<n>From the social bot perspective, this dataset contains 65,749 social bots and 345,886 genuine accounts.
- Score: 17.53279395036265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The social media platform is an ideal medium to spread misinformation, where social bots might accelerate the spread. This paper is the first to explore the interplay between social bots and misinformation on the Sina Weibo platform. We construct a large-scale dataset that contains annotations of misinformation and social bots. From the misinformation perspective, this dataset is multimodal, containing 11,393 pieces of misinformation and 16,416 pieces of real information. From the social bot perspective, this dataset contains 65,749 social bots and 345,886 genuine accounts, where we propose a weak-supervised annotator to annotate automatically. Extensive experiments prove that the dataset is the most comprehensive, misinformation and real information are distinguishable, and social bots have high annotation quality. Further analysis illustrates that: (i) social bots are deeply involved in information spread; (ii) misinformation with the same topics has similar content, providing the basis of echo chambers, and social bots amplify this phenomenon; and (iii) social bots generate similar content aiming to manipulate public opinions.
Related papers
- MisinfoEval: Generative AI in the Era of "Alternative Facts" [50.069577397751175]
We introduce a framework for generating and evaluating large language model (LLM) based misinformation interventions.
We present (1) an experiment with a simulated social media environment to measure effectiveness of misinformation interventions, and (2) a second experiment with personalized explanations tailored to the demographics and beliefs of users.
Our findings confirm that LLM-based interventions are highly effective at correcting user behavior.
arXiv Detail & Related papers (2024-10-13T18:16:50Z) - Easy-access online social media metrics can effectively identify misinformation sharing users [41.94295877935867]
We find that higher tweet frequency is positively associated with low factuality in shared content, while account age is negatively associated with it.
Our findings show that relying on these easy-access social network metrics could serve as a low-barrier approach for initial identification of users who are more likely to spread misinformation.
arXiv Detail & Related papers (2024-08-27T16:41:13Z) - Entendre, a Social Bot Detection Tool for Niche, Fringe, and Extreme Social Media [1.4913052010438639]
We introduce Entendre, an open-access, scalable, and platform-agnostic bot detection framework.
We exploit the idea that most social platforms share a generic template, where users can post content, approve content, and provide a bio.
To demonstrate Entendre's effectiveness, we used it to explore the presence of bots among accounts posting racist content on the now-defunct right-wing platform Parler.
arXiv Detail & Related papers (2024-08-13T13:50:49Z) - Adversarial Botometer: Adversarial Analysis for Social Bot Detection [1.9280536006736573]
Social bots produce content that mimics human creativity.
Malicious social bots emerge to deceive people with their unrealistic content.
We evaluate the behavior of a text-based bot detector in a competitive environment.
arXiv Detail & Related papers (2024-05-03T11:28:21Z) - "I'm in the Bluesky Tonight": Insights from a Year Worth of Social Data [0.18416014644193066]
We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts.
arXiv Detail & Related papers (2024-04-29T16:43:39Z) - BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline [47.61306219245444]
Twitter has become a target for bots and fake accounts, resulting in the spread of false information and manipulation.
This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges associated with machine learning model development.
We develop a comprehensive bot detection model named BotArtist, based on user profile features.
arXiv Detail & Related papers (2023-05-31T09:12:35Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Machine Learning-based Automatic Annotation and Detection of COVID-19
Fake News [8.020736472947581]
COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus.
Existing work neglects the presence of bots that act as a catalyst in the spread.
We propose an automated approach for labeling data using verified fact-checked statements on a Twitter dataset.
arXiv Detail & Related papers (2022-09-07T13:55:59Z) - Investigating the Validity of Botometer-based Social Bot Studies [0.0]
Social bots are assumed to be automated social media accounts operated by malicious actors with the goal of manipulating public opinion.
Social bot activity has been reported in many different political contexts, including the U.S. presidential elections.
We point out a fundamental theoretical flaw in the widely-used study design for estimating the prevalence of social bots.
arXiv Detail & Related papers (2022-07-23T09:31:30Z) - Adherence to Misinformation on Social Media Through Socio-Cognitive and
Group-Based Processes [79.79659145328856]
We argue that when misinformation proliferates, this happens because the social media environment enables adherence to misinformation.
We make the case that polarization and misinformation adherence are closely tied.
arXiv Detail & Related papers (2022-06-30T12:34:24Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Exposure to Social Engagement Metrics Increases Vulnerability to
Misinformation [12.737240668157424]
We find that exposure to social engagement signals increases the vulnerability of users to misinformation.
To reduce the spread of misinformation, we call for technology platforms to rethink the display of social engagement metrics.
arXiv Detail & Related papers (2020-05-10T14:55:50Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z) - Measuring Social Biases of Crowd Workers using Counterfactual Queries [84.10721065676913]
Social biases based on gender, race, etc. have been shown to pollute machine learning (ML) pipeline predominantly via biased training datasets.
Crowdsourcing, a popular cost-effective measure to gather labeled training datasets, is not immune to the inherent social biases of crowd workers.
We propose a new method based on counterfactual fairness to quantify the degree of inherent social bias in each crowd worker.
arXiv Detail & Related papers (2020-04-04T21:41:55Z) - Curating Social Media Data [0.0]
We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data.
Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools.
The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data.
arXiv Detail & Related papers (2020-02-21T10:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.