MuSaRoNews: A Multidomain, Multimodal Satire Dataset from Romanian News Articles
- URL: http://arxiv.org/abs/2504.07826v1
- Date: Thu, 10 Apr 2025 15:02:59 GMT
- Title: MuSaRoNews: A Multidomain, Multimodal Satire Dataset from Romanian News Articles
- Authors: Răzvan-Alexandru Smădu, Andreea Iuga, Dumitru-Clementin Cercel,
- Abstract summary: This work introduces a multimodal corpus for satire detection in Romanian news articles named MuSaRoNews.<n> Specifically, we gathered 117,834 public news articles from real and satirical news sources, composing the first multimodal corpus for satire detection in the Romanian language.
- Score: 1.232097230344824
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Satire and fake news can both contribute to the spread of false information, even though both have different purposes (one if for amusement, the other is to misinform). However, it is not enough to rely purely on text to detect the incongruity between the surface meaning and the actual meaning of the news articles, and, often, other sources of information (e.g., visual) provide an important clue for satire detection. This work introduces a multimodal corpus for satire detection in Romanian news articles named MuSaRoNews. Specifically, we gathered 117,834 public news articles from real and satirical news sources, composing the first multimodal corpus for satire detection in the Romanian language. We conducted experiments and showed that the use of both modalities improves performance.
Related papers
- SaRoHead: A Dataset for Satire Detection in Romanian Multi-Domain News Headlines [1.6976911886883272]
SaRoHead is the first corpus for satire detection in Romanian multi-domain news headlines.<n>Our findings show that the clickbait used in some non-satirical headlines significantly influences the model.
arXiv Detail & Related papers (2025-04-10T10:03:29Z) - Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - Fake News in Sheep's Clothing: Robust Fake News Detection Against LLM-Empowered Style Attacks [60.14025705964573]
SheepDog is a style-robust fake news detector that prioritizes content over style in determining news veracity.
SheepDog achieves this resilience through (1) LLM-empowered news reframings that inject style diversity into the training process by customizing articles to match different styles; (2) a style-agnostic training scheme that ensures consistent veracity predictions across style-diverse reframings; and (3) content-focused attributions that distill content-centric guidelines from LLMs for debunking fake news.
arXiv Detail & Related papers (2023-10-16T21:05:12Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Cross-lingual COVID-19 Fake News Detection [54.125563009333995]
We make the first attempt to detect COVID-19 misinformation in a low-resource language (Chinese) only using the fact-checked news in a high-resource language (English)
We propose a deep learning framework named CrossFake to jointly encode the cross-lingual news body texts and capture the news content.
Empirical results on our dataset demonstrate the effectiveness of CrossFake under the cross-lingual setting.
arXiv Detail & Related papers (2021-10-13T04:44:02Z) - SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles [15.877673959068455]
One of the largest corpora for satire detection regardless of language and the only one for the Romanian language.
We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus.
Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research.
arXiv Detail & Related papers (2021-05-13T17:54:37Z) - User Preference-aware Fake News Detection [61.86175081368782]
Existing fake news detection algorithms focus on mining news content for deceptive signals.
We propose a new framework, UPFD, which simultaneously captures various signals from user preferences by joint content and graph modeling.
arXiv Detail & Related papers (2021-04-25T21:19:24Z) - Fake or Real? A Study of Arabic Satirical Fake News [3.007949058551534]
This study conducts several exploratory analyses to identify the linguistic properties of Arabic fake news with satirical content.
We exploit these features to build a number of machine learning models capable of identifying satirical fake news with an accuracy of up to 98.6%.
arXiv Detail & Related papers (2020-11-01T08:56:56Z) - Birds of a Feather Flock Together: Satirical News Detection via Language
Model Differentiation [7.556286423133077]
In satirical news, the lexical and pragmatical attributes of the context are the key factors in amusing the readers.
We propose a method that differentiates the satirical news and true news.
arXiv Detail & Related papers (2020-07-04T18:46:36Z) - Satirical News Detection with Semantic Feature Extraction and
Game-theoretic Rough Sets [5.326582776477692]
We propose a semantic feature based approach to detect satirical news tweets.
Features are extracted by exploring inconsistencies in phrases, entities, and between main and relative clauses.
We apply game-theoretic rough set model to detect satirical news, in which probabilistic thresholds are derived by game equilibrium and repetition learning mechanism.
arXiv Detail & Related papers (2020-04-08T03:22:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.