Machine Learning-based Automatic Annotation and Detection of COVID-19
Fake News
- URL: http://arxiv.org/abs/2209.03162v1
- Date: Wed, 7 Sep 2022 13:55:59 GMT
- Title: Machine Learning-based Automatic Annotation and Detection of COVID-19
Fake News
- Authors: Mohammad Majid Akhtar, Bibhas Sharma, Ishan Karunanayake, Rahat
Masood, Muhammad Ikram, Salil S. Kanhere
- Abstract summary: COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus.
Existing work neglects the presence of bots that act as a catalyst in the spread.
We propose an automated approach for labeling data using verified fact-checked statements on a Twitter dataset.
- Score: 8.020736472947581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: COVID-19 impacted every part of the world, although the misinformation about
the outbreak traveled faster than the virus. Misinformation spread through
online social networks (OSN) often misled people from following correct medical
practices. In particular, OSN bots have been a primary source of disseminating
false information and initiating cyber propaganda. Existing work neglects the
presence of bots that act as a catalyst in the spread and focuses on fake news
detection in 'articles shared in posts' rather than the post (textual) content.
Most work on misinformation detection uses manually labeled datasets that are
hard to scale for building their predictive models. In this research, we
overcome this challenge of data scarcity by proposing an automated approach for
labeling data using verified fact-checked statements on a Twitter dataset. In
addition, we combine textual features with user-level features (such as
followers count and friends count) and tweet-level features (such as number of
mentions, hashtags and urls in a tweet) to act as additional indicators to
detect misinformation. Moreover, we analyzed the presence of bots in tweets and
show that bots change their behavior over time and are most active during the
misinformation campaign. We collected 10.22 Million COVID-19 related tweets and
used our annotation model to build an extensive and original ground truth
dataset for classification purposes. We utilize various machine learning models
to accurately detect misinformation and our best classification model achieves
precision (82%), recall (96%), and false positive rate (3.58%). Also, our bot
analysis indicates that bots generated approximately 10% of misinformation
tweets. Our methodology results in substantial exposure of false information,
thus improving the trustworthiness of information disseminated through social
media platforms.
Related papers
- Entendre, a Social Bot Detection Tool for Niche, Fringe, and Extreme Social Media [1.4913052010438639]
We introduce Entendre, an open-access, scalable, and platform-agnostic bot detection framework.
We exploit the idea that most social platforms share a generic template, where users can post content, approve content, and provide a bio.
To demonstrate Entendre's effectiveness, we used it to explore the presence of bots among accounts posting racist content on the now-defunct right-wing platform Parler.
arXiv Detail & Related papers (2024-08-13T13:50:49Z) - My Brother Helps Me: Node Injection Based Adversarial Attack on Social Bot Detection [69.99192868521564]
Social platforms such as Twitter are under siege from a multitude of fraudulent users.
Due to the structure of social networks, the majority of methods are based on the graph neural network(GNN), which is susceptible to attacks.
We propose a node injection-based adversarial attack method designed to deceive bot detection models.
arXiv Detail & Related papers (2023-10-11T03:09:48Z) - BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline [47.61306219245444]
Twitter has become a target for bots and fake accounts, resulting in the spread of false information and manipulation.
This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges correlated with machine learning model development.
We develop a comprehensive bot detection model named BotArtist, based on user profile features.
arXiv Detail & Related papers (2023-05-31T09:12:35Z) - TwiBot-22: Towards Graph-Based Twitter Bot Detection [39.359825215347655]
TwiBot-22 is a graph-based Twitter bot detection benchmark that presents the largest dataset to date.
We re-implement 35 representative Twitter bot detection baselines and evaluate them on 9 datasets, including TwiBot-22.
To facilitate further research, we consolidate all implemented codes and datasets into the TwiBot-22 evaluation framework.
arXiv Detail & Related papers (2022-06-09T15:23:37Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Combining exogenous and endogenous signals with a semi-supervised
co-attention network for early detection of COVID-19 fake tweets [14.771202995527315]
During COVID-19, tweets with misinformation should be flagged and neutralized in their early stages to mitigate the damages.
Most of the existing methods for early detection of fake news assume to have enough propagation information for large labeled tweets.
We present ENDEMIC, a novel early detection model which leverages endogenous and endogenous signals related to tweets.
arXiv Detail & Related papers (2021-04-12T10:01:44Z) - Evaluating Deep Learning Approaches for Covid19 Fake News Detection [0.0]
We look at automated techniques for fake news detection from a data mining perspective.
We evaluate different supervised text classification algorithms on Contraint@AAAI 2021 Covid-19 Fake news detection dataset.
We report the best accuracy of 98.41% on the Covid-19 Fake news detection dataset.
arXiv Detail & Related papers (2021-01-11T16:39:03Z) - Predicting Misinformation and Engagement in COVID-19 Twitter Discourse
in the First Months of the Outbreak [1.2059055685264957]
We examine nearly 505K COVID-19-related tweets from the initial months of the pandemic to understand misinformation as a function of bot-behavior and engagement.
We found that real users tweet both facts and misinformation, while bots tweet proportionally more misinformation.
arXiv Detail & Related papers (2020-12-03T18:47:34Z) - Detection of Novel Social Bots by Ensembles of Specialized Classifiers [60.63582690037839]
Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion.
We show that different types of bots are characterized by different behavioral features.
We propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule.
arXiv Detail & Related papers (2020-06-11T22:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.