ArCorona: Analyzing Arabic Tweets in the Early Days of Coronavirus
(COVID-19) Pandemic
- URL: http://arxiv.org/abs/2012.01462v3
- Date: Mon, 1 Mar 2021 12:24:15 GMT
- Title: ArCorona: Analyzing Arabic Tweets in the Early Days of Coronavirus
(COVID-19) Pandemic
- Authors: Hamdy Mubarak and Sabit Hassan
- Abstract summary: We present the largest manually annotated dataset of Arabic tweets related to COVID-19.
We describe annotation guidelines, analyze our dataset and build effective machine learning and transformer based models for classification.
- Score: 3.057212947792573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the past few months, there were huge numbers of circulating tweets and
discussions about Coronavirus (COVID-19) in the Arab region. It is important
for policy makers and many people to identify types of shared tweets to better
understand public behavior, topics of interest, requests from governments,
sources of tweets, etc. It is also crucial to prevent spreading of rumors and
misinformation about the virus or bad cures. To this end, we present the
largest manually annotated dataset of Arabic tweets related to COVID-19. We
describe annotation guidelines, analyze our dataset and build effective machine
learning and transformer based models for classification.
Related papers
- "COVID-19 was a FIFA conspiracy #curropt": An Investigation into the
Viral Spread of COVID-19 Misinformation [60.268682953952506]
We estimate the extent to which misinformation has influenced the course of the COVID-19 pandemic using natural language processing models.
We provide a strategy to combat social media posts that are likely to cause widespread harm.
arXiv Detail & Related papers (2022-06-12T19:41:01Z) - ArCovidVac: Analyzing Arabic Tweets About COVID-19 Vaccination [7.594204373985492]
We release the first largest manually annotated Arabic tweet dataset, ArCovidVac, for the COVID-19 vaccination campaign.
The dataset is enriched with different layers of annotation, including, (i) Informativeness (more vs. less importance of the tweets); (ii) fine-grained tweet content types (e.g., advice, rumors, restriction, authenticate news/information); and (iii) stance towards vaccination.
arXiv Detail & Related papers (2022-01-17T16:19:21Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - AraCOVID19-SSD: Arabic COVID-19 Sentiment and Sarcasm Detection Dataset [0.0]
This paper builds and releases AraCOVID19-SSD a manually annotated Arabic COVID-19 sarcasm and sentiment detection dataset containing 5,162 tweets.
A lot of these users often employ sarcasm to convey their intended meaning in a humorous, funny, and indirect way making it hard for computer-based applications to automatically understand and identify their goal and the harm level that they can inflect.
arXiv Detail & Related papers (2021-10-05T11:24:24Z) - CML-COVID: A Large-Scale COVID-19 Twitter Dataset with Latent Topics,
Sentiment and Location Information [0.0]
CML-COVID is a COVID-19 Twitter data set of 19,298,967 million tweets from 5,977,653 unique individuals.
These tweets were collected between March 2020 and July 2020 using the query terms coronavirus, covid and mask related to COVID-19.
arXiv Detail & Related papers (2021-01-28T18:59:10Z) - Eating Garlic Prevents COVID-19 Infection: Detecting Misinformation on
the Arabic Content of Twitter [0.23624125155742054]
We construct a large Arabic dataset related to COVID-19 misinformation and gold-annotate the tweets into two categories: misinformation or not.
We apply eight different traditional and deep machine learning models, with different features including word embeddings and word frequency.
Experiments show that optimizing the area under the curve (AUC) improves the models' performance and the Extreme Gradient Boosting (XGBoost) presents the highest accuracy in detecting COVID-19 misinformation online.
arXiv Detail & Related papers (2021-01-09T22:52:21Z) - Understanding the Hoarding Behaviors during the COVID-19 Pandemic using
Large Scale Social Media Data [77.34726150561087]
We analyze the hoarding and anti-hoarding patterns of over 42,000 unique Twitter users in the United States from March 1 to April 30, 2020.
We find the percentage of females in both hoarding and anti-hoarding groups is higher than that of the general Twitter user population.
The LIWC anxiety mean for the hoarding-related tweets is significantly higher than the baseline Twitter anxiety mean.
arXiv Detail & Related papers (2020-10-15T16:02:25Z) - A System for Worldwide COVID-19 Information Aggregation [92.60866520230803]
We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics.
A neural machine translation module translates articles in other languages into Japanese and English.
A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently.
arXiv Detail & Related papers (2020-07-28T01:33:54Z) - TICO-19: the Translation Initiative for Covid-19 [112.5601530395345]
The Translation Initiative for COvid-19 (TICO-19) has made test and development data available to AI and MT researchers in 35 different languages.
The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set.
arXiv Detail & Related papers (2020-07-03T16:26:17Z) - Cross-lingual Transfer Learning for COVID-19 Outbreak Alignment [90.12602012910465]
We train on Italy's early COVID-19 outbreak through Twitter and transfer to several other countries.
Our experiments show strong results with up to 0.85 Spearman correlation in cross-country predictions.
arXiv Detail & Related papers (2020-06-05T02:04:25Z) - Large Arabic Twitter Dataset on COVID-19 [0.7734726150561088]
coronavirus disease (COVID-19), emerged late December 2019 in China, is now rapidly spreading across the globe.
The number of global confirmed cases has passed two millions and half with over 180,000 fatalities.
This work describes the first Arabic tweets dataset on COVID-19 that we have been collecting since January 1st, 2020.
arXiv Detail & Related papers (2020-04-09T01:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.