MMCoVaR: Multimodal COVID-19 Vaccine Focused Data Repository for Fake
News Detection and a Baseline Architecture for Classification
- URL: http://arxiv.org/abs/2109.06416v1
- Date: Tue, 14 Sep 2021 03:57:50 GMT
- Title: MMCoVaR: Multimodal COVID-19 Vaccine Focused Data Repository for Fake
News Detection and a Baseline Architecture for Classification
- Authors: Mingxuan Chen, Xinqiao Chu, K.P. Subbalakshmi
- Abstract summary: We provide a new multimodal labeled dataset containing news articles and tweets on the COVID-19 vaccine.
We combine ratings from three news media ranking sites to classify the news dataset into two levels of credibility: reliable and unreliable.
We find that the proposed architecture has an F-Score of 0.919 and accuracy of 0.882 for fake news detection.
- Score: 1.587767395906846
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The outbreak of COVID-19 has resulted in an "infodemic" that has encouraged
the propagation of misinformation about COVID-19 and cure methods which, in
turn, could negatively affect the adoption of recommended public health
measures in the larger population. In this paper, we provide a new multimodal
(consisting of images, text and temporal information) labeled dataset
containing news articles and tweets on the COVID-19 vaccine. We collected 2,593
news articles from 80 publishers for one year between Feb 16th 2020 to May 8th
2021 and 24184 Twitter posts (collected between April 17th 2021 to May 8th
2021). We combine ratings from three news media ranking sites: Medias Bias
Chart, News Guard and Media Bias/Fact Check (MBFC) to classify the news dataset
into two levels of credibility: reliable and unreliable. The combination of
three filters allows for higher precision of labeling. We also propose a stance
detection mechanism to annotate tweets into three levels of credibility:
reliable, unreliable and inconclusive. We provide several statistics as well as
other analytics like, publisher distribution, publication date distribution,
topic analysis, etc. We also provide a novel architecture that classifies the
news data into misinformation or truth to provide a baseline performance for
this dataset. We find that the proposed architecture has an F-Score of 0.919
and accuracy of 0.882 for fake news detection. Furthermore, we provide
benchmark performance for misinformation detection on tweet dataset. This new
multimodal dataset can be used in research on COVID-19 vaccine, including
misinformation detection, influence of fake COVID-19 vaccine information, etc.
Related papers
- Utilization of Multinomial Naive Bayes Algorithm and Term Frequency
Inverse Document Frequency (TF-IDF Vectorizer) in Checking the Credibility of
News Tweet in the Philippines [0.0]
This paper utilizes ground truth-based annotations and TF-IDF as feature extraction for the news articles.
The model has an accuracy of 99.46% in training and 88.98% in predicting unseen data.
arXiv Detail & Related papers (2023-05-30T15:41:15Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - CovidMis20: COVID-19 Misinformation Detection System on Twitter Tweets
using Deep Learning Models [1.4085013201980032]
This research presents the CovidMis20 dataset (COVID-19 Misinformation 2020 dataset), which consists of 1,375,592 tweets collected from February to July 2020.
This research was conducted using Bi-LSTM deep learning and an ensemble CNN+Bi-GRU for fake news detection.
arXiv Detail & Related papers (2022-09-13T00:43:44Z) - Machine Learning-based Automatic Annotation and Detection of COVID-19
Fake News [8.020736472947581]
COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus.
Existing work neglects the presence of bots that act as a catalyst in the spread.
We propose an automated approach for labeling data using verified fact-checked statements on a Twitter dataset.
arXiv Detail & Related papers (2022-09-07T13:55:59Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - User Preference-aware Fake News Detection [61.86175081368782]
Existing fake news detection algorithms focus on mining news content for deceptive signals.
We propose a new framework, UPFD, which simultaneously captures various signals from user preferences by joint content and graph modeling.
arXiv Detail & Related papers (2021-04-25T21:19:24Z) - Misinfo Belief Frames: A Case Study on Covid & Climate News [49.979419711713795]
We propose a formalism for understanding how readers perceive the reliability of news and the impact of misinformation.
We introduce the Misinfo Belief Frames (MBF) corpus, a dataset of 66k inferences over 23.5k headlines.
Our results using large-scale language modeling to predict misinformation frames show that machine-generated inferences can influence readers' trust in news headlines.
arXiv Detail & Related papers (2021-04-18T09:50:11Z) - A Heuristic-driven Uncertainty based Ensemble Framework for Fake News
Detection in Tweets and News Articles [5.979726271522835]
We describe a novel Fake News Detection system that automatically identifies whether a news item is "real" or "fake"
We have used an ensemble model consisting of pre-trained models followed by a statistical feature fusion network.
Our proposed framework have also quantified reliable predictive uncertainty along with proper class output confidence level for the classification task.
arXiv Detail & Related papers (2021-04-05T06:35:30Z) - Transformer based Automatic COVID-19 Fake News Detection System [9.23545668304066]
Misinformation is especially prevalent in the ongoing coronavirus disease (COVID-19) pandemic.
We report a methodology to analyze the reliability of information shared on social media pertaining to the COVID-19 pandemic.
Our system obtained 0.9855 f1-score on testset and ranked 5th among 160 teams.
arXiv Detail & Related papers (2021-01-01T06:49:27Z) - Misinformation Has High Perplexity [55.47422012881148]
We propose to leverage the perplexity to debunk false claims in an unsupervised manner.
First, we extract reliable evidence from scientific and news sources according to sentence similarity to the claims.
Second, we prime a language model with the extracted evidence and finally evaluate the correctness of given claims based on the perplexity scores at debunking time.
arXiv Detail & Related papers (2020-06-08T15:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.