MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection
- URL: http://arxiv.org/abs/2403.09092v2
- Date: Wed, 24 Jul 2024 05:57:01 GMT
- Title: MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection
- Authors: Yupeng Li, Haorui He, Jin Bai, Dacheng Wen,
- Abstract summary: Methods trained on purely one single news source can hardly be applicable to real-world scenarios.
We constructed the first multi-source benchmark dataset for Chinese fake news detection, termed MCFEND.
MCFEND, as a benchmark dataset, aims to advance Chinese fake news detection approaches in real-world scenarios.
- Score: 5.288018460787191
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The prevalence of fake news across various online sources has had a significant influence on the public. Existing Chinese fake news detection datasets are limited to news sourced solely from Weibo. However, fake news originating from multiple sources exhibits diversity in various aspects, including its content and social context. Methods trained on purely one single news source can hardly be applicable to real-world scenarios. Our pilot experiment demonstrates that the F1 score of the state-of-the-art method that learns from a large Chinese fake news detection dataset, Weibo-21, drops significantly from 0.943 to 0.470 when the test data is changed to multi-source news data, failing to identify more than one-third of the multi-source fake news. To address this limitation, we constructed the first multi-source benchmark dataset for Chinese fake news detection, termed MCFEND, which is composed of news we collected from diverse sources such as social platforms, messaging apps, and traditional online news outlets. Notably, such news has been fact-checked by 14 authoritative fact-checking agencies worldwide. In addition, various existing Chinese fake news detection methods are thoroughly evaluated on our proposed dataset in cross-source, multi-source, and unseen source ways. MCFEND, as a benchmark dataset, aims to advance Chinese fake news detection approaches in real-world scenarios.
Related papers
- Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese [0.6775616141339018]
This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese.
We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec.
We develop a user-friendly web platform, fakenewsbr.com, to facilitate the verification of news articles' veracity.
arXiv Detail & Related papers (2023-09-20T04:10:03Z) - Unsupervised Domain-agnostic Fake News Detection using Multi-modal Weak
Signals [19.22829945777267]
This work proposes an effective framework for unsupervised fake news detection, which first embeds the knowledge available in four modalities in news records.
Also, we propose a novel technique to construct news datasets minimizing the latent biases in existing news datasets.
We trained the proposed unsupervised framework using LUND-COVID to exploit the potential of large datasets.
arXiv Detail & Related papers (2023-05-18T23:49:31Z) - Nothing Stands Alone: Relational Fake News Detection with Hypergraph
Neural Networks [49.29141811578359]
We propose to leverage a hypergraph to represent group-wise interaction among news, while focusing on important news relations with its dual-level attention mechanism.
Our approach yields remarkable performance and maintains the high performance even with a small subset of labeled news data.
arXiv Detail & Related papers (2022-12-24T00:19:32Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - User Preference-aware Fake News Detection [61.86175081368782]
Existing fake news detection algorithms focus on mining news content for deceptive signals.
We propose a new framework, UPFD, which simultaneously captures various signals from user preferences by joint content and graph modeling.
arXiv Detail & Related papers (2021-04-25T21:19:24Z) - Lexicon generation for detecting fake news [0.0]
We propose a method primarily based on lexicons including a scoring system to facilitate the detection of the fake news in Turkish.
We contribute to the literature by collecting a novel, large scale, and credible dataset of Turkish news, and by constructing the first fake news detection lexicon for Turkish.
arXiv Detail & Related papers (2020-10-16T20:39:57Z) - BanFakeNews: A Dataset for Detecting Fake News in Bangla [1.4170999534105675]
We propose an annotated dataset of 50K news that can be used for building automated fake news detection systems.
We develop a benchmark system with state of the art NLP techniques to identify Bangla fake news.
arXiv Detail & Related papers (2020-04-19T07:42:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.