Similarity Detection Pipeline for Crawling a Topic Related Fake News
Corpus
- URL: http://arxiv.org/abs/2009.13367v2
- Date: Mon, 1 Mar 2021 12:32:57 GMT
- Title: Similarity Detection Pipeline for Crawling a Topic Related Fake News
Corpus
- Authors: Inna Vogel, Jeong-Eun Choi, Meghana Meghana
- Abstract summary: We propose a new, publicly available German topic related corpus for fake news detection.
We also develop a pipeline for crawling similar news articles.
As our third contribution, we conduct different learning experiments to detect fake news.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fake news detection is a challenging task aiming to reduce human time and
effort to check the truthfulness of news. Automated approaches to combat fake
news, however, are limited by the lack of labeled benchmark datasets,
especially in languages other than English. Moreover, many publicly available
corpora have specific limitations that make them difficult to use. To address
this problem, our contribution is threefold. First, we propose a new, publicly
available German topic related corpus for fake news detection. To the best of
our knowledge, this is the first corpus of its kind. In this regard, we
developed a pipeline for crawling similar news articles. As our third
contribution, we conduct different learning experiments to detect fake news.
The best performance was achieved using sentence level embeddings from SBERT in
combination with a Bi-LSTM (k=0.88).
Related papers
- Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - WSDMS: Debunk Fake News via Weakly Supervised Detection of Misinforming
Sentences with Contextualized Social Wisdom [13.92421433941043]
We investigate a novel task in the field of fake news debunking, which involves detecting sentence-level misinformation.
Inspired by the Multiple Instance Learning (MIL) approach, we propose a model called Weakly Supervised Detection of Misinforming Sentences (WSDMS)
We evaluate WSDMS on three real-world benchmarks and demonstrate that it outperforms existing state-of-the-art baselines in debunking fake news at both the sentence and article levels.
arXiv Detail & Related papers (2023-10-25T12:06:55Z) - Nothing Stands Alone: Relational Fake News Detection with Hypergraph
Neural Networks [49.29141811578359]
We propose to leverage a hypergraph to represent group-wise interaction among news, while focusing on important news relations with its dual-level attention mechanism.
Our approach yields remarkable performance and maintains the high performance even with a small subset of labeled news data.
arXiv Detail & Related papers (2022-12-24T00:19:32Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - Fake news detection using parallel BERT deep neural networks [1.0359008237358598]
We introduce MWPBert, which uses two parallel BERT networks to perform veracity detection on full-text news articles.
Experiment results showed that the proposed model outperformed previous models in terms of accuracy and other performance measures.
arXiv Detail & Related papers (2022-04-10T23:16:00Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - User Preference-aware Fake News Detection [61.86175081368782]
Existing fake news detection algorithms focus on mining news content for deceptive signals.
We propose a new framework, UPFD, which simultaneously captures various signals from user preferences by joint content and graph modeling.
arXiv Detail & Related papers (2021-04-25T21:19:24Z) - Supporting verification of news articles with automated search for
semantically similar articles [0.0]
We propose an evidence retrieval approach to handle fake news.
The learning task is formulated as an unsupervised machine learning problem.
We find that our approach is agnostic to concept drifts, i.e. the machine learning task is independent of the hypotheses in a text.
arXiv Detail & Related papers (2021-03-29T12:56:59Z) - BanFakeNews: A Dataset for Detecting Fake News in Bangla [1.4170999534105675]
We propose an annotated dataset of 50K news that can be used for building automated fake news detection systems.
We develop a benchmark system with state of the art NLP techniques to identify Bangla fake news.
arXiv Detail & Related papers (2020-04-19T07:42:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.