Make Satire Boring Again: Reducing Stylistic Bias of Satirical Corpus by Utilizing Generative LLMs
- URL: http://arxiv.org/abs/2412.09247v1
- Date: Thu, 12 Dec 2024 12:57:55 GMT
- Title: Make Satire Boring Again: Reducing Stylistic Bias of Satirical Corpus by Utilizing Generative LLMs
- Authors: Asli Umay Ozturk, Recep Firat Cekinel, Asli Umay Ozturk,
- Abstract summary: This study proposes a debiasing approach for satire detection, focusing on reducing biases in training data by utilizing generative large language models.
Results show that the debiasing method enhances the robustness and generalizability of the models for satire and irony detection tasks in Turkish and English.
This work curates and presents the Turkish Satirical News dataset with detailed human annotations, with case studies on classification, debiasing, and explainability.
- Score: 0.0
- License:
- Abstract: Satire detection is essential for accurately extracting opinions from textual data and combating misinformation online. However, the lack of diverse corpora for satire leads to the problem of stylistic bias which impacts the models' detection performances. This study proposes a debiasing approach for satire detection, focusing on reducing biases in training data by utilizing generative large language models. The approach is evaluated in both cross-domain (irony detection) and cross-lingual (English) settings. Results show that the debiasing method enhances the robustness and generalizability of the models for satire and irony detection tasks in Turkish and English. However, its impact on causal language models, such as Llama-3.1, is limited. Additionally, this work curates and presents the Turkish Satirical News Dataset with detailed human annotations, with case studies on classification, debiasing, and explainability.
Related papers
- Religious Bias Landscape in Language and Text-to-Image Models: Analysis, Detection, and Debiasing Strategies [16.177734242454193]
The widespread adoption of language models highlights the need for critical examinations of their inherent biases.
This study systematically investigates religious bias in both language models and text-to-image generation models.
arXiv Detail & Related papers (2025-01-14T21:10:08Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Challenges in Measuring Bias via Open-Ended Language Generation [1.5552869983952944]
We analyze how specific choices of prompt sets, metrics, automatic tools and sampling strategies affect bias results.
We provide recommendations for reporting biases in open-ended language generation.
arXiv Detail & Related papers (2022-05-23T19:57:15Z) - The World of an Octopus: How Reporting Bias Influences a Language
Model's Perception of Color [73.70233477125781]
We show that reporting bias negatively impacts and inherently limits text-only training.
We then demonstrate that multimodal models can leverage their visual training to mitigate these effects.
arXiv Detail & Related papers (2021-10-15T16:28:17Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
Bias in NLP [10.936043362876651]
We propose a decoding algorithm that reduces the probability of a model producing problematic text.
While our approach does by no means eliminate the issue of language models generating biased text, we believe it to be an important step in this direction.
arXiv Detail & Related papers (2021-02-28T11:07:37Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Satirical News Detection with Semantic Feature Extraction and
Game-theoretic Rough Sets [5.326582776477692]
We propose a semantic feature based approach to detect satirical news tweets.
Features are extracted by exploring inconsistencies in phrases, entities, and between main and relative clauses.
We apply game-theoretic rough set model to detect satirical news, in which probabilistic thresholds are derived by game equilibrium and repetition learning mechanism.
arXiv Detail & Related papers (2020-04-08T03:22:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.