AraCOVID19-SSD: Arabic COVID-19 Sentiment and Sarcasm Detection Dataset
- URL: http://arxiv.org/abs/2110.01948v1
- Date: Tue, 5 Oct 2021 11:24:24 GMT
- Title: AraCOVID19-SSD: Arabic COVID-19 Sentiment and Sarcasm Detection Dataset
- Authors: Mohamed Seghir Hadj Ameur, Hassina Aliane
- Abstract summary: This paper builds and releases AraCOVID19-SSD a manually annotated Arabic COVID-19 sarcasm and sentiment detection dataset containing 5,162 tweets.
A lot of these users often employ sarcasm to convey their intended meaning in a humorous, funny, and indirect way making it hard for computer-based applications to automatically understand and identify their goal and the harm level that they can inflect.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Coronavirus disease (COVID-19) is an infectious respiratory disease that was
first discovered in late December 2019, in Wuhan, China, and then spread
worldwide causing a lot of panic and death. Users of social networking sites
such as Facebook and Twitter have been focused on reading, publishing, and
sharing novelties, tweets, and articles regarding the newly emerging pandemic.
A lot of these users often employ sarcasm to convey their intended meaning in a
humorous, funny, and indirect way making it hard for computer-based
applications to automatically understand and identify their goal and the harm
level that they can inflect. Motivated by the emerging need for annotated
datasets that tackle these kinds of problems in the context of COVID-19, this
paper builds and releases AraCOVID19-SSD a manually annotated Arabic COVID-19
sarcasm and sentiment detection dataset containing 5,162 tweets. To confirm the
practical utility of the built dataset, it has been carefully analyzed and
tested using several classification models.
Related papers
- Sarcasm Detection in a Disaster Context [103.93691731605163]
We introduce HurricaneSARC, a dataset of 15,000 tweets annotated for intended sarcasm.
Our best model is able to obtain as much as 0.70 F1 on our dataset.
arXiv Detail & Related papers (2023-08-16T05:58:12Z) - "COVID-19 was a FIFA conspiracy #curropt": An Investigation into the
Viral Spread of COVID-19 Misinformation [60.268682953952506]
We estimate the extent to which misinformation has influenced the course of the COVID-19 pandemic using natural language processing models.
We provide a strategy to combat social media posts that are likely to cause widespread harm.
arXiv Detail & Related papers (2022-06-12T19:41:01Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech
Detection Dataset [0.0]
"AraCOVID19-MFH" is a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset.
Our dataset contains 10,828 Arabic tweets annotated with 10 different labels.
It can also be used for hate speech detection, opinion/news classification, dialect identification, and many other tasks.
arXiv Detail & Related papers (2021-05-07T09:52:44Z) - ArCorona: Analyzing Arabic Tweets in the Early Days of Coronavirus
(COVID-19) Pandemic [3.057212947792573]
We present the largest manually annotated dataset of Arabic tweets related to COVID-19.
We describe annotation guidelines, analyze our dataset and build effective machine learning and transformer based models for classification.
arXiv Detail & Related papers (2020-12-02T19:05:25Z) - Understanding the temporal evolution of COVID-19 research through
machine learning and natural language processing [66.63200823918429]
The outbreak of the novel coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been continuously affecting human lives and communities around the world.
We used multiple data sources, i.e., PubMed and ArXiv, and built several machine learning models to characterize the landscape of current COVID-19 research.
Our findings confirm the types of research available in PubMed and ArXiv differ significantly, with the former exhibiting greater diversity in terms of COVID-19 related issues.
arXiv Detail & Related papers (2020-07-22T18:02:39Z) - On Analyzing Antisocial Behaviors Amid COVID-19 Pandemic [5.900114841365645]
Despite the gravity of the issue, very few studies have studied online antisocial behaviors amid the COVID-19 pandemic.
In this paper, we fill the research gap by collecting and annotating a large dataset of over 40 million COVID-19 related tweets.
We also conduct an empirical analysis of our annotated dataset and found that new abusive lexicons are introduced amid the COVID-19 pandemic.
arXiv Detail & Related papers (2020-07-21T11:11:35Z) - TICO-19: the Translation Initiative for Covid-19 [112.5601530395345]
The Translation Initiative for COvid-19 (TICO-19) has made test and development data available to AI and MT researchers in 35 different languages.
The same data is translated into all of the languages represented, meaning that testing or development can be done for any pairing of languages in the set.
arXiv Detail & Related papers (2020-07-03T16:26:17Z) - Cross-lingual Transfer Learning for COVID-19 Outbreak Alignment [90.12602012910465]
We train on Italy's early COVID-19 outbreak through Twitter and transfer to several other countries.
Our experiments show strong results with up to 0.85 Spearman correlation in cross-country predictions.
arXiv Detail & Related papers (2020-06-05T02:04:25Z) - Large Arabic Twitter Dataset on COVID-19 [0.7734726150561088]
coronavirus disease (COVID-19), emerged late December 2019 in China, is now rapidly spreading across the globe.
The number of global confirmed cases has passed two millions and half with over 180,000 fatalities.
This work describes the first Arabic tweets dataset on COVID-19 that we have been collecting since January 1st, 2020.
arXiv Detail & Related papers (2020-04-09T01:07:12Z) - Mining Coronavirus (COVID-19) Posts in Social Media [3.04585143845864]
World Health Organization (WHO) characterized the novel coronavirus (COVID-19) as a global pandemic on March 11th, 2020.
In this article we report the preliminary results of our study on automatically detecting the positive reports of COVID-19 from social media user postings using state-of-the-art machine learning models.
arXiv Detail & Related papers (2020-03-28T23:38:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.