Cross-lingual COVID-19 Fake News Detection
- URL: http://arxiv.org/abs/2110.06495v2
- Date: Thu, 14 Oct 2021 19:54:41 GMT
- Title: Cross-lingual COVID-19 Fake News Detection
- Authors: Jiangshu Du, Yingtong Dou, Congying Xia, Limeng Cui, Jing Ma, Philip
S. Yu
- Abstract summary: We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
- Score: 54.125563009333995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The COVID-19 pandemic poses a great threat to global public health.
Meanwhile, there is massive misinformation associated with the pandemic which
advocates unfounded or unscientific claims. Even major social media and news
outlets have made an extra effort in debunking COVID-19 misinformation, most of
the fact-checking information is in English, whereas some unmoderated COVID-19
misinformation is still circulating in other languages, threatening the health
of less-informed people in immigrant communities and developing countries. In
this paper, we make the first attempt to detect COVID-19 misinformation in a
low-resource language (Chinese) only using the fact-checked news in a
high-resource language (English). We start by curating a Chinese real&fake news
dataset according to existing fact-checking information. Then, we propose a
deep learning framework named CrossFake to jointly encode the cross-lingual
news body texts and capture the news content as much as possible. Empirical
results on our dataset demonstrate the effectiveness of CrossFake under the
cross-lingual setting and it also outperforms several monolingual and
cross-lingual fake news detectors. The dataset is available at
https://github.com/YingtongDou/CrossFake.
Related papers
- Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - Detecting Chinese Fake News on Twitter during the COVID-19 Pandemic [14.572408454393042]
The outbreak of COVID-19 has led to a global surge of Sinophobia partly because of the spread of misinformation, disinformation, and fake news on China.
We report on the creation of a novel classifier that detects whether Chinese-language social media posts from Twitter are related to fake news about China.
We provide the final model and a new training dataset with 18,425 tweets for researchers to study fake news in the Chinese language during the COVID-19 pandemic.
arXiv Detail & Related papers (2023-04-07T03:03:11Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - "COVID-19 was a FIFA conspiracy #curropt": An Investigation into the
Viral Spread of COVID-19 Misinformation [60.268682953952506]
We estimate the extent to which misinformation has influenced the course of the COVID-19 pandemic using natural language processing models.
We provide a strategy to combat social media posts that are likely to cause widespread harm.
arXiv Detail & Related papers (2022-06-12T19:41:01Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech
Detection Dataset [0.0]
"AraCOVID19-MFH" is a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset.
Our dataset contains 10,828 Arabic tweets annotated with 10 different labels.
It can also be used for hate speech detection, opinion/news classification, dialect identification, and many other tasks.
arXiv Detail & Related papers (2021-05-07T09:52:44Z) - User Preference-aware Fake News Detection [61.86175081368782]
Existing fake news detection algorithms focus on mining news content for deceptive signals.
We propose a new framework, UPFD, which simultaneously captures various signals from user preferences by joint content and graph modeling.
arXiv Detail & Related papers (2021-04-25T21:19:24Z) - Cross-SEAN: A Cross-Stitch Semi-Supervised Neural Attention Model for
COVID-19 Fake News Detection [14.771202995527315]
COVID-19 related fake news has been spreading faster than the facts.
We introduce CTF, the first COVID-19 Twitter fake news dataset with labeled genuine and fake tweets.
We also propose Cross-SEAN, a cross-stitch based semi-supervised end-to-end neural attention model.
arXiv Detail & Related papers (2021-02-17T18:30:43Z) - Evaluating Deep Learning Approaches for Covid19 Fake News Detection [0.0]
We look at automated techniques for fake news detection from a data mining perspective.
We evaluate different supervised text classification algorithms on Contraint@AAAI 2021 Covid-19 Fake news detection dataset.
We report the best accuracy of 98.41% on the Covid-19 Fake news detection dataset.
arXiv Detail & Related papers (2021-01-11T16:39:03Z) - MM-COVID: A Multilingual and Multimodal Data Repository for Combating
COVID-19 Disinformation [37.52398946169075]
This dataset provides the multilingual fake news and the relevant social context.
We collect 3981 pieces of fake news content and 7192 trustworthy information from English, Spanish, Portuguese, Hindi, French and Italian, 6 different languages.
arXiv Detail & Related papers (2020-11-08T21:42:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.