Related papers: SaRoHead: A Dataset for Satire Detection in Romanian Multi-Domain News Headlines

SaRoHead: A Dataset for Satire Detection in Romanian Multi-Domain News Headlines

URL: http://arxiv.org/abs/2504.07612v1
Date: Thu, 10 Apr 2025 10:03:29 GMT
Title: SaRoHead: A Dataset for Satire Detection in Romanian Multi-Domain News Headlines
Authors: Mihnea-Alexandru Vîrlan, Răzvan-Alexandru Smădu, Dumitru-Clementin Cercel,
Abstract summary: SaRoHead is the first corpus for satire detection in Romanian multi-domain news headlines.<n>Our findings show that the clickbait used in some non-satirical headlines significantly influences the model.
Score: 1.6976911886883272
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The headline is an important part of a news article, influenced by expressiveness and connection to the exposed subject. Although most news outlets aim to present reality objectively, some publications prefer a humorous approach in which stylistic elements of satire, irony, and sarcasm blend to cover specific topics. Satire detection can be difficult because a headline aims to expose the main idea behind a news article. In this paper, we propose SaRoHead, the first corpus for satire detection in Romanian multi-domain news headlines. Our findings show that the clickbait used in some non-satirical headlines significantly influences the model.

Related papers

MuSaRoNews: A Multidomain, Multimodal Satire Dataset from Romanian News Articles [1.232097230344824]
This work introduces a multimodal corpus for satire detection in Romanian news articles named MuSaRoNews.<n> Specifically, we gathered 117,834 public news articles from real and satirical news sources, composing the first multimodal corpus for satire detection in the Romanian language.
arXiv Detail & Related papers (2025-04-10T15:02:59Z)
Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news. Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z)
Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection. The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z)
NewsEdits: A News Article Revision Dataset and a Document-Level Reasoning Challenge [122.37011526554403]
NewsEdits is the first publicly available dataset of news revision histories. It contains 1.2 million articles with 4.6 million versions from over 22 English- and French-language newspaper sources.
arXiv Detail & Related papers (2022-06-14T18:47:13Z)
SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles [15.877673959068455]
One of the largest corpora for satire detection regardless of language and the only one for the Romanian language. We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus. Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research.
arXiv Detail & Related papers (2021-05-13T17:54:37Z)
Misinfo Belief Frames: A Case Study on Covid & Climate News [49.979419711713795]
We propose a formalism for understanding how readers perceive the reliability of news and the impact of misinformation. We introduce the Misinfo Belief Frames (MBF) corpus, a dataset of 66k inferences over 23.5k headlines. Our results using large-scale language modeling to predict misinformation frames show that machine-generated inferences can influence readers' trust in news headlines.
arXiv Detail & Related papers (2021-04-18T09:50:11Z)
Birds of a Feather Flock Together: Satirical News Detection via Language Model Differentiation [7.556286423133077]
In satirical news, the lexical and pragmatical attributes of the context are the key factors in amusing the readers. We propose a method that differentiates the satirical news and true news.
arXiv Detail & Related papers (2020-07-04T18:46:36Z)
Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News Multi-Headline Generation [98.98411895250774]
We propose generating multiple headlines with keyphrases of user interests. The proposed method achieves state-of-the-art results in terms of quality and diversity.
arXiv Detail & Related papers (2020-04-08T08:30:05Z)
Satirical News Detection with Semantic Feature Extraction and Game-theoretic Rough Sets [5.326582776477692]
We propose a semantic feature based approach to detect satirical news tweets. Features are extracted by exploring inconsistencies in phrases, entities, and between main and relative clauses. We apply game-theoretic rough set model to detect satirical news, in which probabilistic thresholds are derived by game equilibrium and repetition learning mechanism.
arXiv Detail & Related papers (2020-04-08T03:22:21Z)
BaitWatcher: A lightweight web interface for the detection of incongruent news headlines [27.29585619643952]
BaitWatcher is a lightweight web interface that guides readers in estimating the likelihood of incongruence in news articles before clicking on the headlines. BaiittWatcher utilizes a hierarchical recurrent encoder that efficiently learns complex textual representations of a news headline and its associated body text.
arXiv Detail & Related papers (2020-03-23T23:43:02Z)
HoaxItaly: a collection of Italian disinformation and fact-checking stories shared on Twitter in 2019 [72.96986027203377]
The dataset includes also title and body for approximately 37k news articles. It is publicly available at https://doi.org/10.79DVN/ PGVDHX.
arXiv Detail & Related papers (2020-01-29T16:14:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.