Mitigating Gender Bias in Distilled Language Models via Counterfactual
Role Reversal
- URL: http://arxiv.org/abs/2203.12574v1
- Date: Wed, 23 Mar 2022 17:34:35 GMT
- Title: Mitigating Gender Bias in Distilled Language Models via Counterfactual
Role Reversal
- Authors: Umang Gupta, Jwala Dhamala, Varun Kumar, Apurv Verma, Yada
Pruksachatkun, Satyapriya Krishna, Rahul Gupta, Kai-Wei Chang, Greg Ver
Steeg, Aram Galstyan
- Abstract summary: Language excel models can be biased in ways including male and female knowledge with genderneutral genders.
We present a novel approach to mitigate gender disparity based on multiple learning role settings.
We observe that models that reduce gender polarity language do not improve fairness or downstream classification.
- Score: 74.52580517012832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models excel at generating coherent text, and model compression
techniques such as knowledge distillation have enabled their use in
resource-constrained settings. However, these models can be biased in multiple
ways, including the unfounded association of male and female genders with
gender-neutral professions. Therefore, knowledge distillation without any
fairness constraints may preserve or exaggerate the teacher model's biases onto
the distilled model. To this end, we present a novel approach to mitigate
gender disparity in text generation by learning a fair model during knowledge
distillation. We propose two modifications to the base knowledge distillation
based on counterfactual role reversal$\unicode{x2014}$modifying teacher
probabilities and augmenting the training set. We evaluate gender polarity
across professions in open-ended text generated from the resulting distilled
and finetuned GPT$\unicode{x2012}$2 models and demonstrate a substantial
reduction in gender disparity with only a minor compromise in utility. Finally,
we observe that language models that reduce gender polarity in language
generation do not improve embedding fairness or downstream classification
fairness.
Related papers
- Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - DiFair: A Benchmark for Disentangled Assessment of Gender Knowledge and
Bias [13.928591341824248]
Debiasing techniques have been proposed to mitigate the gender bias that is prevalent in pretrained language models.
These are often evaluated on datasets that check the extent to which the model is gender-neutral in its predictions.
This evaluation protocol overlooks the possible adverse impact of bias mitigation on useful gender knowledge.
arXiv Detail & Related papers (2023-10-22T15:27:16Z) - Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender
Perturbation over Fairytale Texts [87.62403265382734]
Recent studies show that traditional fairytales are rife with harmful gender biases.
This work aims to assess learned biases of language models by evaluating their robustness against gender perturbations.
arXiv Detail & Related papers (2023-10-16T22:25:09Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model [32.21372089380992]
We train a rewriting model for German without the need for elaborate handcrafted rules.
The outputs of this model increased gender-fairness as shown in a human evaluation study.
arXiv Detail & Related papers (2023-05-18T17:35:28Z) - Exploring Gender Bias in Retrieval Models [2.594412743115663]
Mitigating gender bias in information retrieval is important to avoid propagating stereotypes.
We employ a dataset consisting of two components: (1) relevance of a document to a query and (2) "gender" of a document.
We show that pre-trained models for IR do not perform well in zero-shot retrieval tasks when full fine-tuning of a large pre-trained BERT encoder is performed.
We also illustrate that pre-trained models have gender biases that result in retrieved articles tending to be more often male than female.
arXiv Detail & Related papers (2022-08-02T21:12:05Z) - The Birth of Bias: A case study on the evolution of gender bias in an
English language model [1.6344851071810076]
We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus.
We find that the representation of gender is dynamic and identify different phases during training.
We show that gender information is represented increasingly locally in the input embeddings of the model.
arXiv Detail & Related papers (2022-07-21T00:59:04Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - First the worst: Finding better gender translations during beam search [19.921216907778447]
We focus on gender bias resulting from systematic errors in grammatical gender translation.
We experiment with reranking nbest lists using gender features obtained automatically from the source sentence.
We find that a combination of these techniques allows large gains in WinoMT accuracy without requiring additional bilingual data or an additional NMT model.
arXiv Detail & Related papers (2021-04-15T12:53:30Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.