AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech
Detection Dataset
- URL: http://arxiv.org/abs/2105.03143v1
- Date: Fri, 7 May 2021 09:52:44 GMT
- Title: AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech
Detection Dataset
- Authors: Mohamed Seghir Hadj Ameur, Hassina Aliane
- Abstract summary: "AraCOVID19-MFH" is a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset.
Our dataset contains 10,828 Arabic tweets annotated with 10 different labels.
It can also be used for hate speech detection, opinion/news classification, dialect identification, and many other tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Along with the COVID-19 pandemic, an "infodemic" of false and misleading
information has emerged and has complicated the COVID-19 response efforts.
Social networking sites such as Facebook and Twitter have contributed largely
to the spread of rumors, conspiracy theories, hate, xenophobia, racism, and
prejudice. To combat the spread of fake news, researchers around the world have
and are still making considerable efforts to build and share COVID-19 related
research articles, models, and datasets. This paper releases "AraCOVID19-MFH" a
manually annotated multi-label Arabic COVID-19 fake news and hate speech
detection dataset. Our dataset contains 10,828 Arabic tweets annotated with 10
different labels. The labels have been designed to consider some aspects
relevant to the fact-checking task, such as the tweet's check worthiness,
positivity/negativity, and factuality. To confirm our annotated dataset's
practical utility, we used it to train and evaluate several classification
models and reported the obtained results. Though the dataset is mainly designed
for fake news detection, it can also be used for hate speech detection,
opinion/news classification, dialect identification, and many other tasks.
Related papers
- Machine Learning-based Automatic Annotation and Detection of COVID-19
Fake News [8.020736472947581]
COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus.
Existing work neglects the presence of bots that act as a catalyst in the spread.
We propose an automated approach for labeling data using verified fact-checked statements on a Twitter dataset.
arXiv Detail & Related papers (2022-09-07T13:55:59Z) - UrduFake@FIRE2021: Shared Track on Fake News Identification in Urdu [55.41644538483948]
This study reports the second shared task named as UrduFake@FIRE2021 on identifying fake news detection in Urdu language.
The proposed systems were based on various count-based features and used different classifiers as well as neural network architectures.
The gradient descent (SGD) algorithm outperformed other classifiers and achieved 0.679 F-score.
arXiv Detail & Related papers (2022-07-11T19:15:04Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - Half-Truth: A Partially Fake Audio Detection Dataset [60.08010668752466]
This paper develops a dataset for half-truth audio detection (HAD)
Partially fake audio in the HAD dataset involves only changing a few words in an utterance.
We can not only detect fake uttrances but also localize manipulated regions in a speech using this dataset.
arXiv Detail & Related papers (2021-04-08T08:57:13Z) - Hostility Detection and Covid-19 Fake News Detection in Social Media [1.3499391168620467]
We build a model that makes use of an abusive language detector and features extracted via Hindi BERT and Hindi FastText models.
We also build models to identify fake news related to Covid-19 in English tweets.
arXiv Detail & Related papers (2021-01-15T03:24:36Z) - Evaluating Deep Learning Approaches for Covid19 Fake News Detection [0.0]
We look at automated techniques for fake news detection from a data mining perspective.
We evaluate different supervised text classification algorithms on Contraint@AAAI 2021 Covid-19 Fake news detection dataset.
We report the best accuracy of 98.41% on the Covid-19 Fake news detection dataset.
arXiv Detail & Related papers (2021-01-11T16:39:03Z) - Eating Garlic Prevents COVID-19 Infection: Detecting Misinformation on
the Arabic Content of Twitter [0.23624125155742054]
We construct a large Arabic dataset related to COVID-19 misinformation and gold-annotate the tweets into two categories: misinformation or not.
We apply eight different traditional and deep machine learning models, with different features including word embeddings and word frequency.
Experiments show that optimizing the area under the curve (AUC) improves the models' performance and the Extreme Gradient Boosting (XGBoost) presents the highest accuracy in detecting COVID-19 misinformation online.
arXiv Detail & Related papers (2021-01-09T22:52:21Z) - Trawling for Trolling: A Dataset [56.1778095945542]
We present a dataset that models trolling as a subcategory of offensive content.
The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech.
arXiv Detail & Related papers (2020-08-02T17:23:55Z) - Misinformation Has High Perplexity [55.47422012881148]
We propose to leverage the perplexity to debunk false claims in an unsupervised manner.
First, we extract reliable evidence from scientific and news sources according to sentence similarity to the claims.
Second, we prime a language model with the extracted evidence and finally evaluate the correctness of given claims based on the perplexity scores at debunking time.
arXiv Detail & Related papers (2020-06-08T15:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.