SaRoHead: A Dataset for Satire Detection in Romanian Multi-Domain News Headlines
- URL: http://arxiv.org/abs/2504.07612v1
- Date: Thu, 10 Apr 2025 10:03:29 GMT
- Title: SaRoHead: A Dataset for Satire Detection in Romanian Multi-Domain News Headlines
- Authors: Mihnea-Alexandru Vîrlan, Răzvan-Alexandru Smădu, Dumitru-Clementin Cercel,
- Abstract summary: SaRoHead is the first corpus for satire detection in Romanian multi-domain news headlines.<n>Our findings show that the clickbait used in some non-satirical headlines significantly influences the model.
- Score: 1.6976911886883272
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The headline is an important part of a news article, influenced by expressiveness and connection to the exposed subject. Although most news outlets aim to present reality objectively, some publications prefer a humorous approach in which stylistic elements of satire, irony, and sarcasm blend to cover specific topics. Satire detection can be difficult because a headline aims to expose the main idea behind a news article. In this paper, we propose SaRoHead, the first corpus for satire detection in Romanian multi-domain news headlines. Our findings show that the clickbait used in some non-satirical headlines significantly influences the model.
Related papers
- MuSaRoNews: A Multidomain, Multimodal Satire Dataset from Romanian News Articles [1.232097230344824]
This work introduces a multimodal corpus for satire detection in Romanian news articles named MuSaRoNews.<n> Specifically, we gathered 117,834 public news articles from real and satirical news sources, composing the first multimodal corpus for satire detection in the Romanian language.
arXiv Detail & Related papers (2025-04-10T15:02:59Z) - Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - NewsEdits: A News Article Revision Dataset and a Document-Level
Reasoning Challenge [122.37011526554403]
NewsEdits is the first publicly available dataset of news revision histories.
It contains 1.2 million articles with 4.6 million versions from over 22 English- and French-language newspaper sources.
arXiv Detail & Related papers (2022-06-14T18:47:13Z) - SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles [15.877673959068455]
One of the largest corpora for satire detection regardless of language and the only one for the Romanian language.
We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus.
Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research.
arXiv Detail & Related papers (2021-05-13T17:54:37Z) - Misinfo Belief Frames: A Case Study on Covid & Climate News [49.979419711713795]
We propose a formalism for understanding how readers perceive the reliability of news and the impact of misinformation.
We introduce the Misinfo Belief Frames (MBF) corpus, a dataset of 66k inferences over 23.5k headlines.
Our results using large-scale language modeling to predict misinformation frames show that machine-generated inferences can influence readers' trust in news headlines.
arXiv Detail & Related papers (2021-04-18T09:50:11Z) - Birds of a Feather Flock Together: Satirical News Detection via Language
Model Differentiation [7.556286423133077]
In satirical news, the lexical and pragmatical attributes of the context are the key factors in amusing the readers.
We propose a method that differentiates the satirical news and true news.
arXiv Detail & Related papers (2020-07-04T18:46:36Z) - Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News
Multi-Headline Generation [98.98411895250774]
We propose generating multiple headlines with keyphrases of user interests.
The proposed method achieves state-of-the-art results in terms of quality and diversity.
arXiv Detail & Related papers (2020-04-08T08:30:05Z) - Satirical News Detection with Semantic Feature Extraction and
Game-theoretic Rough Sets [5.326582776477692]
We propose a semantic feature based approach to detect satirical news tweets.
Features are extracted by exploring inconsistencies in phrases, entities, and between main and relative clauses.
We apply game-theoretic rough set model to detect satirical news, in which probabilistic thresholds are derived by game equilibrium and repetition learning mechanism.
arXiv Detail & Related papers (2020-04-08T03:22:21Z) - BaitWatcher: A lightweight web interface for the detection of
incongruent news headlines [27.29585619643952]
BaitWatcher is a lightweight web interface that guides readers in estimating the likelihood of incongruence in news articles before clicking on the headlines.
BaiittWatcher utilizes a hierarchical recurrent encoder that efficiently learns complex textual representations of a news headline and its associated body text.
arXiv Detail & Related papers (2020-03-23T23:43:02Z) - HoaxItaly: a collection of Italian disinformation and fact-checking
stories shared on Twitter in 2019 [72.96986027203377]
The dataset includes also title and body for approximately 37k news articles.
It is publicly available at https://doi.org/10.79DVN/ PGVDHX.
arXiv Detail & Related papers (2020-01-29T16:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.