A Multi-Label Dataset of French Fake News: Human and Machine Insights
- URL: http://arxiv.org/abs/2403.16099v2
- Date: Thu, 11 Apr 2024 09:58:17 GMT
- Title: A Multi-Label Dataset of French Fake News: Human and Machine Insights
- Authors: Benjamin Icard, François Maine, Morgane Casanova, Géraud Faye, Julien Chanson, Guillaume Gadek, Ghislain Atemezing, François Bancilhon, Paul Égré,
- Abstract summary: We present a corpus of 100 documents, OBSINFOX, selected from 17 sources of French press considered unreliable by expert agencies.
By collecting more labels than usual, we can identify features that humans consider as characteristic of fake news.
We present a topic and genre analysis using Gate Cloud, indicative of the prevalence of satire-like text in the corpus.
- Score: 0.5533610982157059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a corpus of 100 documents, OBSINFOX, selected from 17 sources of French press considered unreliable by expert agencies, annotated using 11 labels by 8 annotators. By collecting more labels than usual, by more annotators than is typically done, we can identify features that humans consider as characteristic of fake news, and compare them to the predictions of automated classifiers. We present a topic and genre analysis using Gate Cloud, indicative of the prevalence of satire-like text in the corpus. We then use the subjectivity analyzer VAGO, and a neural version of it, to clarify the link between ascriptions of the label Subjective and ascriptions of the label Fake News. The annotated dataset is available online at the following url: https://github.com/obs-info/obsinfox Keywords: Fake News, Multi-Labels, Subjectivity, Vagueness, Detail, Opinion, Exaggeration, French Press
Related papers
- FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection [54.37159298632628]
FineFake is a multi-domain knowledge-enhanced benchmark for fake news detection.
FineFake encompasses 16,909 data samples spanning six semantic topics and eight platforms.
The entire FineFake project is publicly accessible as an open-source repository.
arXiv Detail & Related papers (2024-03-30T14:39:09Z) - Exposing propaganda: an analysis of stylistic cues comparing human
annotations and machine classification [0.7749297275724032]
This paper investigates the language of propaganda and its stylistic features.
It presents the PPN dataset, composed of news articles extracted from websites identified as propaganda sources.
We propose different NLP techniques to identify the cues used by the annotators, and to compare them with machine classification.
arXiv Detail & Related papers (2024-02-06T07:51:54Z) - Gen-Z: Generative Zero-Shot Text Classification with Contextualized
Label Descriptions [50.92702206798324]
We propose a generative prompting framework for zero-shot text classification.
GEN-Z measures the LM likelihood of input text conditioned on natural language descriptions of labels.
We show that zero-shot classification with simple contextualization of the data source consistently outperforms both zero-shot and few-shot baselines.
arXiv Detail & Related papers (2023-11-13T07:12:57Z) - Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - It's All in the Embedding! Fake News Detection Using Document Embeddings [0.6091702876917281]
We propose a new approach that uses document embeddings to build multiple models that accurately label news articles as reliable or fake.
We also present a benchmark on different architectures that detect fake news using binary or multi-labeled classification.
arXiv Detail & Related papers (2023-04-16T13:30:06Z) - FreCDo: A Large Corpus for French Cross-Domain Dialect Identification [22.132457694021184]
We present a novel corpus for French dialect identification comprising 413,522 French text samples.
The training, validation and test splits are collected from different news websites.
This leads to a French cross-domain (FreCDo) dialect identification task.
arXiv Detail & Related papers (2022-12-15T10:32:29Z) - Predictive linguistic cues for fake news: a societal artificial
intelligence problem [9.40467099889021]
We present linguistic characteristics of media news items to differentiate between fake news and real news using machine learning algorithms.
We use neural networks which mainly control distributional features rather than evidence.
Features unique, negative, positive, and cardinal numbers with high values on the metrics are observed to provide a high area under the curve (AUC) and F1-score.
arXiv Detail & Related papers (2022-11-26T07:50:01Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - Label Noise-Resistant Mean Teaching for Weakly Supervised Fake News
Detection [93.6222609806278]
We propose a novel label noise-resistant mean teaching approach (LNMT) for weakly supervised fake news detection.
LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data.
LNMT establishes a mean teacher framework equipped with label propagation and label reliability estimation.
arXiv Detail & Related papers (2022-06-10T16:01:58Z) - A Heuristic-driven Uncertainty based Ensemble Framework for Fake News
Detection in Tweets and News Articles [5.979726271522835]
We describe a novel Fake News Detection system that automatically identifies whether a news item is "real" or "fake"
We have used an ensemble model consisting of pre-trained models followed by a statistical feature fusion network.
Our proposed framework have also quantified reliable predictive uncertainty along with proper class output confidence level for the classification task.
arXiv Detail & Related papers (2021-04-05T06:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.