Related papers: Biased or Flawed? Mitigating Stereotypes in Generative Language Models by Addressing Task-Specific Flaws

Biased or Flawed? Mitigating Stereotypes in Generative Language Models by Addressing Task-Specific Flaws

URL: http://arxiv.org/abs/2412.11414v1
Date: Mon, 16 Dec 2024 03:29:08 GMT
Title: Biased or Flawed? Mitigating Stereotypes in Generative Language Models by Addressing Task-Specific Flaws
Authors: Akshita Jha, Sanchit Kabra, Chandan K. Reddy,
Abstract summary: generative language models often reflect and amplify societal biases in their outputs.<n>We propose a targeted stereotype mitigation framework that implicitly mitigates observed stereotypes in generative models.<n>We reduce stereotypical outputs by over 60% across multiple dimensions.
Score: 12.559028963968247
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies have shown that generative language models often reflect and amplify societal biases in their outputs. However, these studies frequently conflate observed biases with other task-specific shortcomings, such as comprehension failure. For example, when a model misinterprets a text and produces a response that reinforces a stereotype, it becomes difficult to determine whether the issue arises from inherent bias or from a misunderstanding of the given content. In this paper, we conduct a multi-faceted evaluation that distinctly disentangles bias from flaws within the reading comprehension task. We propose a targeted stereotype mitigation framework that implicitly mitigates observed stereotypes in generative models through instruction-tuning on general-purpose datasets. We reduce stereotypical outputs by over 60% across multiple dimensions -- including nationality, age, gender, disability, and physical appearance -- by addressing comprehension-based failures, and without relying on explicit debiasing techniques. We evaluate several state-of-the-art generative models to demonstrate the effectiveness of our approach while maintaining the overall utility. Our findings highlight the need to critically disentangle the concept of `bias' from other types of errors to build more targeted and effective mitigation strategies. CONTENT WARNING: Some examples contain offensive stereotypes.

Related papers

Gender Encoding Patterns in Pretrained Language Model Representations [17.101242741559428]
Gender bias in pretrained language models (PLMs) poses significant social and ethical challenges. This study adopts an information-theoretic approach to analyze how gender biases are encoded within various encoder-based architectures.
arXiv Detail & Related papers (2025-03-09T19:17:46Z)
Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion [0.40964539027092917]
We evaluate the severity of bias toward a view by using a biased model in edge cases of excessive bias scenarios. Our findings reveal a discrepancy in LLM performance in identifying implicit and explicit opinions, with a general tendency of bias toward explicit opinions of opposing stances. The direct, incautious responses of the unaligned models suggest a need for further refinement of decisiveness.
arXiv Detail & Related papers (2024-08-15T15:23:00Z)
Debiasing Multimodal Sarcasm Detection with Contrastive Learning [5.43710908542843]
We propose a novel debiasing multimodal sarcasm detection framework with contrastive learning. In particular, we first design counterfactual data augmentation to construct the positive samples with dissimilar word biases. We devise an adapted debiasing contrastive learning mechanism to empower the model to learn robust task-relevant features.
arXiv Detail & Related papers (2023-12-16T16:14:50Z)
Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment. We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections. We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z)
Exposing Bias in Online Communities through Large-Scale Language Models [3.04585143845864]
This work uses the flaw of bias in language models to explore the biases of six different online communities. The bias of the resulting models is evaluated by prompting the models with different demographics and comparing the sentiment and toxicity values of these generations. This work not only affirms how easily bias is absorbed from training data but also presents a scalable method to identify and compare the bias of different datasets or communities.
arXiv Detail & Related papers (2023-06-04T08:09:26Z)
Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks. We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations. We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes. We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z)
The Birth of Bias: A case study on the evolution of gender bias in an English language model [1.6344851071810076]
We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus. We find that the representation of gender is dynamic and identify different phases during training. We show that gender information is represented increasingly locally in the input embeddings of the model.
arXiv Detail & Related papers (2022-07-21T00:59:04Z)
A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning [55.96577490779591]
Vision-language models can encode societal biases and stereotypes. There are challenges to measuring and mitigating these multimodal harms. We investigate bias measures and apply ranking metrics for image-text representations.
arXiv Detail & Related papers (2022-03-22T17:59:04Z)
Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race. Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables. This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases. We propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
UnQovering Stereotyping Biases via Underspecified Questions [68.81749777034409]
We present UNQOVER, a framework to probe and quantify biases through underspecified questions. We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors. We use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion.
arXiv Detail & Related papers (2020-10-06T01:49:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.