SaRoHead: Detecting Satire in a Multi-Domain Romanian News Headline Dataset
- URL: http://arxiv.org/abs/2504.07612v3
- Date: Sun, 31 Aug 2025 15:19:07 GMT
- Title: SaRoHead: Detecting Satire in a Multi-Domain Romanian News Headline Dataset
- Authors: Mihnea-Alexandru Vîrlan, Răzvan-Alexandru Smădu, Dumitru-Clementin Cercel, Florin Pop, Mihaela-Claudia Cercel,
- Abstract summary: Even the headline must reflect the tone of the satirical main content.<n>Current approaches for the Romanian language detect the tone by combining the main article and the headline.
- Score: 3.1208433686641666
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The primary goal of a news headline is to summarize an event in as few words as possible. Depending on the media outlet, a headline can serve as a means to objectively deliver a summary or improve its visibility. For the latter, specific publications may employ stylistic approaches that incorporate the use of sarcasm, irony, and exaggeration, key elements of a satirical approach. As such, even the headline must reflect the tone of the satirical main content. Current approaches for the Romanian language tend to detect the non-conventional tone (i.e., satire and clickbait) of the news content by combining both the main article and the headline. Because we consider a headline to be merely a brief summary of the main article, we investigate in this paper the presence of satirical tone in headlines alone, testing multiple baselines ranging from standard machine learning algorithms to deep learning models. Our experiments show that Bidirectional Transformer models outperform both standard machine-learning approaches and Large Language Models (LLMs), particularly when the meta-learning Reptile approach is employed.
Related papers
- CrossNews-UA: A Cross-lingual News Semantic Similarity Benchmark for Ukrainian, Polish, Russian, and English [53.32175252285023]
Cross-lingual news comparison offers a promising approach to verify information.<n>Existing datasets for cross-lingual news analysis were manually curated by journalists and experts.<n>We introduce a scalable, explainable crowdsourcing pipeline for cross-lingual news similarity assessment.
arXiv Detail & Related papers (2025-10-22T14:23:50Z) - SeLeRoSa: Sentence-Level Romanian Satire Detection Dataset [2.709981170021896]
We introduce the first sentence-level dataset for Romanian satire detection for news articles, called SeLeRoSa.<n>The dataset comprises 13,873 manually annotated sentences spanning various domains, including social issues, IT, science, and movies.
arXiv Detail & Related papers (2025-08-31T15:12:51Z) - MuSaRoNews: A Multidomain, Multimodal Satire Dataset from Romanian News Articles [1.232097230344824]
This work introduces a multimodal corpus for satire detection in Romanian news articles named MuSaRoNews.<n> Specifically, we gathered 117,834 public news articles from real and satirical news sources, composing the first multimodal corpus for satire detection in the Romanian language.
arXiv Detail & Related papers (2025-04-10T15:02:59Z) - SCStory: Self-supervised and Continual Online Story Discovery [53.72745249384159]
SCStory helps people digest rapidly published news article streams in real-time without human annotations.
SCStory employs self-supervised and continual learning with a novel idea of story-indicative adaptive modeling of news article streams.
arXiv Detail & Related papers (2023-11-27T04:50:01Z) - Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - NewsEdits: A News Article Revision Dataset and a Document-Level
Reasoning Challenge [122.37011526554403]
NewsEdits is the first publicly available dataset of news revision histories.
It contains 1.2 million articles with 4.6 million versions from over 22 English- and French-language newspaper sources.
arXiv Detail & Related papers (2022-06-14T18:47:13Z) - SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles [15.877673959068455]
One of the largest corpora for satire detection regardless of language and the only one for the Romanian language.
We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus.
Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research.
arXiv Detail & Related papers (2021-05-13T17:54:37Z) - Misinfo Belief Frames: A Case Study on Covid & Climate News [49.979419711713795]
We propose a formalism for understanding how readers perceive the reliability of news and the impact of misinformation.
We introduce the Misinfo Belief Frames (MBF) corpus, a dataset of 66k inferences over 23.5k headlines.
Our results using large-scale language modeling to predict misinformation frames show that machine-generated inferences can influence readers' trust in news headlines.
arXiv Detail & Related papers (2021-04-18T09:50:11Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Birds of a Feather Flock Together: Satirical News Detection via Language
Model Differentiation [7.556286423133077]
In satirical news, the lexical and pragmatical attributes of the context are the key factors in amusing the readers.
We propose a method that differentiates the satirical news and true news.
arXiv Detail & Related papers (2020-07-04T18:46:36Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News
Multi-Headline Generation [98.98411895250774]
We propose generating multiple headlines with keyphrases of user interests.
The proposed method achieves state-of-the-art results in terms of quality and diversity.
arXiv Detail & Related papers (2020-04-08T08:30:05Z) - Satirical News Detection with Semantic Feature Extraction and
Game-theoretic Rough Sets [5.326582776477692]
We propose a semantic feature based approach to detect satirical news tweets.
Features are extracted by exploring inconsistencies in phrases, entities, and between main and relative clauses.
We apply game-theoretic rough set model to detect satirical news, in which probabilistic thresholds are derived by game equilibrium and repetition learning mechanism.
arXiv Detail & Related papers (2020-04-08T03:22:21Z) - BaitWatcher: A lightweight web interface for the detection of
incongruent news headlines [27.29585619643952]
BaitWatcher is a lightweight web interface that guides readers in estimating the likelihood of incongruence in news articles before clicking on the headlines.
BaiittWatcher utilizes a hierarchical recurrent encoder that efficiently learns complex textual representations of a news headline and its associated body text.
arXiv Detail & Related papers (2020-03-23T23:43:02Z) - HoaxItaly: a collection of Italian disinformation and fact-checking
stories shared on Twitter in 2019 [72.96986027203377]
The dataset includes also title and body for approximately 37k news articles.
It is publicly available at https://doi.org/10.79DVN/ PGVDHX.
arXiv Detail & Related papers (2020-01-29T16:14:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.