FakeCovid -- A Multilingual Cross-domain Fact Check News Dataset for
COVID-19
- URL: http://arxiv.org/abs/2006.11343v1
- Date: Fri, 19 Jun 2020 19:48:00 GMT
- Title: FakeCovid -- A Multilingual Cross-domain Fact Check News Dataset for
COVID-19
- Authors: Gautam Kishore Shahi, Durgesh Nandini
- Abstract summary: We present a first multilingual cross-domain dataset of 5182 fact-checked news articles for COVID-19.
We have collected the fact-checked articles from 92 different fact-checking websites after obtaining references from Poynter and Snopes.
The dataset is in 40 languages from 105 countries.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present a first multilingual cross-domain dataset of 5182
fact-checked news articles for COVID-19, collected from 04/01/2020 to
15/05/2020. We have collected the fact-checked articles from 92 different
fact-checking websites after obtaining references from Poynter and Snopes. We
have manually annotated articles into 11 different categories of the
fact-checked news according to their content. The dataset is in 40 languages
from 105 countries. We have built a classifier to detect fake news and present
results for the automatic fake news detection and its class. Our model achieves
an F1 score of 0.76 to detect the false class and other fact check articles.
The FakeCovid dataset is available at Github.
Related papers
- Written and spoken corpus of real and fake social media postings about
COVID-19 [0.0]
The data was analysed using the Linguistic Inquiry and Word Count (LIWC) software to detect patterns in linguistic data.
The results indicate a set of linguistic features that distinguish fake news from real news in both written and speech data.
arXiv Detail & Related papers (2023-10-06T13:21:04Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu [62.6928395368204]
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language.
The goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing.
The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business.
arXiv Detail & Related papers (2022-07-25T03:46:51Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news.
We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing.
42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z) - UrduFake@FIRE2021: Shared Track on Fake News Identification in Urdu [55.41644538483948]
This study reports the second shared task named as UrduFake@FIRE2021 on identifying fake news detection in Urdu language.
The proposed systems were based on various count-based features and used different classifiers as well as neural network architectures.
The gradient descent (SGD) algorithm outperformed other classifiers and achieved 0.679 F-score.
arXiv Detail & Related papers (2022-07-11T19:15:04Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021 [55.41644538483948]
The goal of the shared task is to motivate the community to come up with efficient methods for solving this vital problem.
The training set contains 1300 annotated news articles -- 750 real news, 550 fake news, while the testing set contains 300 news articles -- 200 real, 100 fake news.
The best performing system obtained an F1-macro score of 0.679, which is lower than the past year's best result of 0.907 F1-macro.
arXiv Detail & Related papers (2022-07-11T18:58:36Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - NoFake at CheckThat! 2021: Fake News Detection Using BERT [0.0]
We have presented BERT based classification model to predict the domain and classification.
We have achieved a macro F1 score of 83.76 % for Task 3Aand 85.55 % for Task 3B using the additional training data.
arXiv Detail & Related papers (2021-08-11T19:13:04Z) - AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech
Detection Dataset [0.0]
"AraCOVID19-MFH" is a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset.
Our dataset contains 10,828 Arabic tweets annotated with 10 different labels.
It can also be used for hate speech detection, opinion/news classification, dialect identification, and many other tasks.
arXiv Detail & Related papers (2021-05-07T09:52:44Z) - A Heuristic-driven Uncertainty based Ensemble Framework for Fake News
Detection in Tweets and News Articles [5.979726271522835]
We describe a novel Fake News Detection system that automatically identifies whether a news item is "real" or "fake"
We have used an ensemble model consisting of pre-trained models followed by a statistical feature fusion network.
Our proposed framework have also quantified reliable predictive uncertainty along with proper class output confidence level for the classification task.
arXiv Detail & Related papers (2021-04-05T06:35:30Z) - Hostility Detection and Covid-19 Fake News Detection in Social Media [1.3499391168620467]
We build a model that makes use of an abusive language detector and features extracted via Hindi BERT and Hindi FastText models.
We also build models to identify fake news related to Covid-19 in English tweets.
arXiv Detail & Related papers (2021-01-15T03:24:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.