ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation
Detection
- URL: http://arxiv.org/abs/2010.08768v2
- Date: Sat, 13 Mar 2021 20:26:35 GMT
- Title: ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation
Detection
- Authors: Fatima Haouari, Maram Hasanain, Reem Suwaileh, Tamer Elsayed
- Abstract summary: ArCOV19-Rumors is an Arabic COVID-19 Twitter dataset for misinformation detection composed of tweets containing claims from 27th January till the end of April 2020.
We collected 138 verified claims, mostly from popular fact-checking websites, and identified 9.4K relevant tweets to those claims.
Tweets were manually-annotated by veracity to support research on misinformation detection, which is one of the major problems faced during a pandemic.
- Score: 6.688963029270579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we introduce ArCOV19-Rumors, an Arabic COVID-19 Twitter dataset
for misinformation detection composed of tweets containing claims from 27th
January till the end of April 2020. We collected 138 verified claims, mostly
from popular fact-checking websites, and identified 9.4K relevant tweets to
those claims. Tweets were manually-annotated by veracity to support research on
misinformation detection, which is one of the major problems faced during a
pandemic. ArCOV19-Rumors supports two levels of misinformation detection over
Twitter: verifying free-text claims (called claim-level verification) and
verifying claims expressed in tweets (called tweet-level verification). Our
dataset covers, in addition to health, claims related to other topical
categories that were influenced by COVID-19, namely, social, politics, sports,
entertainment, and religious. Moreover, we present benchmarking results for
tweet-level verification on the dataset. We experimented with SOTA models of
versatile approaches that either exploit content, user profiles features,
temporal features and propagation structure of the conversational threads for
tweet verification.
Related papers
- Twitter Dataset on the Russo-Ukrainian War [68.713984286035]
We have initiated an ongoing dataset acquisition from Twitter API.
The dataset has reached the amount of 57.3 million tweets, originating from 7.7 million users.
We apply an initial volume and sentiment analysis, while the dataset can be used to further exploratory investigation towards topic analysis, hate speech, propaganda recognition, or even show potential malicious entities like botnets.
arXiv Detail & Related papers (2022-04-07T12:33:06Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - FaVIQ: FAct Verification from Information-seeking Questions [77.7067957445298]
We construct a large-scale fact verification dataset called FaVIQ using information-seeking questions posed by real users.
Our claims are verified to be natural, contain little lexical bias, and require a complete understanding of the evidence for verification.
arXiv Detail & Related papers (2021-07-05T17:31:44Z) - Understanding Information Spreading Mechanisms During COVID-19 Pandemic
by Analyzing the Impact of Tweet Text and User Features for Retweet
Prediction [6.658785818853953]
COVID-19 has affected the world economy and the daily life routine of almost everyone.
Social media platforms enable users to share information with other users who can reshare this information.
We propose two CNN and RNN based models and evaluate the performance of these models on a publicly available TweetsCOV19 dataset.
arXiv Detail & Related papers (2021-05-26T15:55:58Z) - CML-COVID: A Large-Scale COVID-19 Twitter Dataset with Latent Topics,
Sentiment and Location Information [0.0]
CML-COVID is a COVID-19 Twitter data set of 19,298,967 million tweets from 5,977,653 unique individuals.
These tweets were collected between March 2020 and July 2020 using the query terms coronavirus, covid and mask related to COVID-19.
arXiv Detail & Related papers (2021-01-28T18:59:10Z) - Predicting Misinformation and Engagement in COVID-19 Twitter Discourse
in the First Months of the Outbreak [1.2059055685264957]
We examine nearly 505K COVID-19-related tweets from the initial months of the pandemic to understand misinformation as a function of bot-behavior and engagement.
We found that real users tweet both facts and misinformation, while bots tweet proportionally more misinformation.
arXiv Detail & Related papers (2020-12-03T18:47:34Z) - The Role of the Crowd in Countering Misinformation: A Case Study of the
COVID-19 Infodemic [15.885290526721544]
We focus on tweets related to the COVID-19 pandemic, analyzing the spread of misinformation, professional fact checks, and the crowd response to popular misleading claims about COVID-19.
We train a classifier to create a novel dataset of 155,468 COVID-19-related tweets, containing 33,237 false claims and 33,413 refuting arguments.
We observe that the surge in misinformation tweets results in a quick response and a corresponding increase in tweets that refute such misinformation.
arXiv Detail & Related papers (2020-11-11T13:48:44Z) - Understanding the Hoarding Behaviors during the COVID-19 Pandemic using
Large Scale Social Media Data [77.34726150561087]
We analyze the hoarding and anti-hoarding patterns of over 42,000 unique Twitter users in the United States from March 1 to April 30, 2020.
We find the percentage of females in both hoarding and anti-hoarding groups is higher than that of the general Twitter user population.
The LIWC anxiety mean for the hoarding-related tweets is significantly higher than the baseline Twitter anxiety mean.
arXiv Detail & Related papers (2020-10-15T16:02:25Z) - Misinformation Has High Perplexity [55.47422012881148]
We propose to leverage the perplexity to debunk false claims in an unsupervised manner.
First, we extract reliable evidence from scientific and news sources according to sentence similarity to the claims.
Second, we prime a language model with the extracted evidence and finally evaluate the correctness of given claims based on the perplexity scores at debunking time.
arXiv Detail & Related papers (2020-06-08T15:13:44Z) - An Exploratory Study of COVID-19 Misinformation on Twitter [5.070542698701158]
During the COVID-19 pandemic, social media has become a home ground for misinformation.
We have conducted an exploratory study into the propagation, authors and content of misinformation on Twitter around the topic of COVID-19.
arXiv Detail & Related papers (2020-05-12T12:07:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.