Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender
Perturbation over Fairytale Texts
- URL: http://arxiv.org/abs/2310.10865v2
- Date: Wed, 15 Nov 2023 21:32:28 GMT
- Title: Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender
Perturbation over Fairytale Texts
- Authors: Christina Chance, Da Yin, Dakuo Wang, Kai-Wei Chang
- Abstract summary: Recent studies show that traditional fairytales are rife with harmful gender biases.
This work aims to assess learned biases of language models by evaluating their robustness against gender perturbations.
- Score: 87.62403265382734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies show that traditional fairytales are rife with harmful gender
biases. To help mitigate these gender biases in fairytales, this work aims to
assess learned biases of language models by evaluating their robustness against
gender perturbations. Specifically, we focus on Question Answering (QA) tasks
in fairytales. Using counterfactual data augmentation to the FairytaleQA
dataset, we evaluate model robustness against swapped gender character
information, and then mitigate learned biases by introducing counterfactual
gender stereotypes during training time. We additionally introduce a novel
approach that utilizes the massive vocabulary of language models to support
text genres beyond fairytales. Our experimental results suggest that models are
sensitive to gender perturbations, with significant performance drops compared
to the original testing set. However, when first fine-tuned on a counterfactual
training dataset, models are less sensitive to the later introduced anti-gender
stereotyped text.
Related papers
- Are Models Biased on Text without Gender-related Language? [14.931375031931386]
We introduce UnStereoEval (USE), a novel framework for investigating gender bias in stereotype-free scenarios.
USE defines a sentence-level score based on pretraining data statistics to determine if the sentence contain minimal word-gender associations.
We find low fairness across all 28 tested models, suggesting that bias does not solely stem from the presence of gender-related words.
arXiv Detail & Related papers (2024-05-01T15:51:15Z) - DiFair: A Benchmark for Disentangled Assessment of Gender Knowledge and
Bias [13.928591341824248]
Debiasing techniques have been proposed to mitigate the gender bias that is prevalent in pretrained language models.
These are often evaluated on datasets that check the extent to which the model is gender-neutral in its predictions.
This evaluation protocol overlooks the possible adverse impact of bias mitigation on useful gender knowledge.
arXiv Detail & Related papers (2023-10-22T15:27:16Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - Model-Agnostic Gender Debiased Image Captioning [29.640940966944697]
Image captioning models are known to perpetuate and amplify harmful societal bias in the training set.
We propose a framework, called LIBRA, that learns from synthetically biased samples to decrease both types of biases.
arXiv Detail & Related papers (2023-04-07T15:30:49Z) - A Moral- and Event- Centric Inspection of Gender Bias in Fairy Tales at
A Large Scale [50.92540580640479]
We computationally analyze gender bias in a fairy tale dataset containing 624 fairy tales from 7 different cultures.
We find that the number of male characters is two times that of female characters, showing a disproportionate gender representation.
Female characters turn out more associated with care-, loyalty- and sanctity- related moral words, while male characters are more associated with fairness- and authority- related moral words.
arXiv Detail & Related papers (2022-11-25T19:38:09Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - Stereotype and Skew: Quantifying Gender Bias in Pre-trained and
Fine-tuned Language Models [5.378664454650768]
This paper proposes two intuitive metrics, skew and stereotype, that quantify and analyse the gender bias present in contextual language models.
We find evidence that gender stereotype correlates approximately negatively with gender skew in out-of-the-box models, suggesting that there is a trade-off between these two forms of bias.
arXiv Detail & Related papers (2021-01-24T10:57:59Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.