SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles
- URL: http://arxiv.org/abs/2105.06456v2
- Date: Fri, 14 May 2021 05:24:26 GMT
- Title: SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles
- Authors: Ana-Cristina Rogoz, Mihaela Gaman, Radu Tudor Ionescu
- Abstract summary: One of the largest corpora for satire detection regardless of language and the only one for the Romanian language.
We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus.
Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research.
- Score: 15.877673959068455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we introduce a corpus for satire detection in Romanian news. We
gathered 55,608 public news articles from multiple real and satirical news
sources, composing one of the largest corpora for satire detection regardless
of language and the only one for the Romanian language. We provide an official
split of the text samples, such that training news articles belong to different
sources than test news articles, thus ensuring that models do not achieve high
performance simply due to overfitting. We conduct experiments with two
state-of-the-art deep neural models, resulting in a set of strong baselines for
our novel corpus. Our results show that the machine-level accuracy for satire
detection in Romanian is quite low (under 73% on the test set) compared to the
human-level accuracy (87%), leaving enough room for improvement in future
research.
Related papers
- MuSaRoNews: A Multidomain, Multimodal Satire Dataset from Romanian News Articles [1.232097230344824]
This work introduces a multimodal corpus for satire detection in Romanian news articles named MuSaRoNews.
Specifically, we gathered 117,834 public news articles from real and satirical news sources, composing the first multimodal corpus for satire detection in the Romanian language.
arXiv Detail & Related papers (2025-04-10T15:02:59Z) - SaRoHead: A Dataset for Satire Detection in Romanian Multi-Domain News Headlines [1.6976911886883272]
SaRoHead is the first corpus for satire detection in Romanian multi-domain news headlines.
Our findings show that the clickbait used in some non-satirical headlines significantly influences the model.
arXiv Detail & Related papers (2025-04-10T10:03:29Z) - Make Satire Boring Again: Reducing Stylistic Bias of Satirical Corpus by Utilizing Generative LLMs [0.0]
This study proposes a debiasing approach for satire detection, focusing on reducing biases in training data by utilizing generative large language models.
Results show that the debiasing method enhances the robustness and generalizability of the models for satire and irony detection tasks in Turkish and English.
This work curates and presents the Turkish Satirical News dataset with detailed human annotations, with case studies on classification, debiasing, and explainability.
arXiv Detail & Related papers (2024-12-12T12:57:55Z) - Adapting Fake News Detection to the Era of Large Language Models [48.5847914481222]
We study the interplay between machine-(paraphrased) real news, machine-generated fake news, human-written fake news, and human-written real news.
Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa.
arXiv Detail & Related papers (2023-11-02T08:39:45Z) - Tackling Fake News in Bengali: Unraveling the Impact of Summarization vs. Augmentation on Pre-trained Language Models [0.0]
We propose a methodology consisting of four distinct approaches to classify fake news articles in Bengali.
Our approach includes translating English news articles and using augmentation techniques to curb the deficit of fake news articles.
We show the effectiveness of summarization and augmentation in the case of Bengali fake news detection.
arXiv Detail & Related papers (2023-07-13T14:50:55Z) - LTCR: Long-Text Chinese Rumor Detection Dataset [14.503426768310536]
Long-Text Chinese Rumor dataset named LTCR is proposed.
The dataset consists of 1,729 and 500 pieces of real and fake news, respectively.
The average lengths of real and fake news are approximately 230 and 152 characters.
arXiv Detail & Related papers (2023-06-12T16:03:36Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu [62.6928395368204]
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language.
The goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing.
The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business.
arXiv Detail & Related papers (2022-07-25T03:46:51Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news.
We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing.
42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - Similarity Detection Pipeline for Crawling a Topic Related Fake News
Corpus [0.0]
We propose a new, publicly available German topic related corpus for fake news detection.
We also develop a pipeline for crawling similar news articles.
As our third contribution, we conduct different learning experiments to detect fake news.
arXiv Detail & Related papers (2020-09-28T14:35:31Z) - Birds of a Feather Flock Together: Satirical News Detection via Language
Model Differentiation [7.556286423133077]
In satirical news, the lexical and pragmatical attributes of the context are the key factors in amusing the readers.
We propose a method that differentiates the satirical news and true news.
arXiv Detail & Related papers (2020-07-04T18:46:36Z) - Satirical News Detection with Semantic Feature Extraction and
Game-theoretic Rough Sets [5.326582776477692]
We propose a semantic feature based approach to detect satirical news tweets.
Features are extracted by exploring inconsistencies in phrases, entities, and between main and relative clauses.
We apply game-theoretic rough set model to detect satirical news, in which probabilistic thresholds are derived by game equilibrium and repetition learning mechanism.
arXiv Detail & Related papers (2020-04-08T03:22:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.