Related papers: SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles

SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles

URL: http://arxiv.org/abs/2105.06456v2
Date: Fri, 14 May 2021 05:24:26 GMT
Title: SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles
Authors: Ana-Cristina Rogoz, Mihaela Gaman, Radu Tudor Ionescu
Abstract summary: One of the largest corpora for satire detection regardless of language and the only one for the Romanian language. We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus. Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research.
Score: 15.877673959068455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we introduce a corpus for satire detection in Romanian news. We gathered 55,608 public news articles from multiple real and satirical news sources, composing one of the largest corpora for satire detection regardless of language and the only one for the Romanian language. We provide an official split of the text samples, such that training news articles belong to different sources than test news articles, thus ensuring that models do not achieve high performance simply due to overfitting. We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus. Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research.

Related papers

CrossNews-UA: A Cross-lingual News Semantic Similarity Benchmark for Ukrainian, Polish, Russian, and English [53.32175252285023]
Cross-lingual news comparison offers a promising approach to verify information.<n>Existing datasets for cross-lingual news analysis were manually curated by journalists and experts.<n>We introduce a scalable, explainable crowdsourcing pipeline for cross-lingual news similarity assessment.
arXiv Detail & Related papers (2025-10-22T14:23:50Z)
SeLeRoSa: Sentence-Level Romanian Satire Detection Dataset [2.709981170021896]
We introduce the first sentence-level dataset for Romanian satire detection for news articles, called SeLeRoSa.<n>The dataset comprises 13,873 manually annotated sentences spanning various domains, including social issues, IT, science, and movies.
arXiv Detail & Related papers (2025-08-31T15:12:51Z)
MuSaRoNews: A Multidomain, Multimodal Satire Dataset from Romanian News Articles [1.232097230344824]
This work introduces a multimodal corpus for satire detection in Romanian news articles named MuSaRoNews. Specifically, we gathered 117,834 public news articles from real and satirical news sources, composing the first multimodal corpus for satire detection in the Romanian language.
arXiv Detail & Related papers (2025-04-10T15:02:59Z)
SaRoHead: A Dataset for Satire Detection in Romanian Multi-Domain News Headlines [1.6976911886883272]
SaRoHead is the first corpus for satire detection in Romanian multi-domain news headlines. Our findings show that the clickbait used in some non-satirical headlines significantly influences the model.
arXiv Detail & Related papers (2025-04-10T10:03:29Z)
Make Satire Boring Again: Reducing Stylistic Bias of Satirical Corpus by Utilizing Generative LLMs [0.0]
This study proposes a debiasing approach for satire detection, focusing on reducing biases in training data by utilizing generative large language models. Results show that the debiasing method enhances the robustness and generalizability of the models for satire and irony detection tasks in Turkish and English. This work curates and presents the Turkish Satirical News dataset with detailed human annotations, with case studies on classification, debiasing, and explainability.
arXiv Detail & Related papers (2024-12-12T12:57:55Z)
Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news. Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z)
Tackling Fake News in Bengali: Unraveling the Impact of Summarization vs. Augmentation on Pre-trained Language Models [0.0]
We propose a methodology consisting of four distinct approaches to classify fake news articles in Bengali. Our approach includes translating English news articles and using augmentation techniques to curb the deficit of fake news articles. We show the effectiveness of summarization and augmentation in the case of Bengali fake news detection.
arXiv Detail & Related papers (2023-07-13T14:50:55Z)
LTCR: Long-Text Chinese Rumor Detection Dataset [14.503426768310536]
Long-Text Chinese Rumor dataset named LTCR is proposed. The dataset consists of 1,729 and 500 pieces of real and fake news, respectively. The average lengths of real and fake news are approximately 230 and 152 characters.
arXiv Detail & Related papers (2023-06-12T16:03:36Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection. The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z)
UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu [62.6928395368204]
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language. The goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing. The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business.
arXiv Detail & Related papers (2022-07-25T03:46:51Z)
Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news. We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing. 42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z)
Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda. Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles. Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z)
Similarity Detection Pipeline for Crawling a Topic Related Fake News Corpus [0.0]
We propose a new, publicly available German topic related corpus for fake news detection. We also develop a pipeline for crawling similar news articles. As our third contribution, we conduct different learning experiments to detect fake news.
arXiv Detail & Related papers (2020-09-28T14:35:31Z)
Birds of a Feather Flock Together: Satirical News Detection via Language Model Differentiation [7.556286423133077]
In satirical news, the lexical and pragmatical attributes of the context are the key factors in amusing the readers. We propose a method that differentiates the satirical news and true news.
arXiv Detail & Related papers (2020-07-04T18:46:36Z)
Satirical News Detection with Semantic Feature Extraction and Game-theoretic Rough Sets [5.326582776477692]
We propose a semantic feature based approach to detect satirical news tweets. Features are extracted by exploring inconsistencies in phrases, entities, and between main and relative clauses. We apply game-theoretic rough set model to detect satirical news, in which probabilistic thresholds are derived by game equilibrium and repetition learning mechanism.
arXiv Detail & Related papers (2020-04-08T03:22:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.