Language Models Get a Gender Makeover: Mitigating Gender Bias with
Few-Shot Data Interventions
- URL: http://arxiv.org/abs/2306.04597v1
- Date: Wed, 7 Jun 2023 16:50:03 GMT
- Title: Language Models Get a Gender Makeover: Mitigating Gender Bias with
Few-Shot Data Interventions
- Authors: Himanshu Thakur, Atishay Jain, Praneetha Vaddamanu, Paul Pu Liang and
Louis-Philippe Morency
- Abstract summary: Societal biases present in pre-trained large language models are a critical issue.
We propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models.
- Score: 50.67412723291881
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Societal biases present in pre-trained large language models are a critical
issue as these models have been shown to propagate biases in countless
downstream applications, rendering them unfair towards specific groups of
people. Since large-scale retraining of these models from scratch is both time
and compute-expensive, a variety of approaches have been previously proposed
that de-bias a pre-trained model. While the majority of current
state-of-the-art debiasing methods focus on changes to the training regime, in
this paper, we propose data intervention strategies as a powerful yet simple
technique to reduce gender bias in pre-trained models. Specifically, we
empirically show that by fine-tuning a pre-trained model on only 10 de-biased
(intervened) training examples, the tendency to favor any gender is
significantly reduced. Since our proposed method only needs a few training
examples, our few-shot debiasing approach is highly feasible and practical.
Through extensive experimentation, we show that our debiasing technique
performs better than competitive state-of-the-art baselines with minimal loss
in language modeling ability.
Related papers
- Reducing Bias in Pre-trained Models by Tuning while Penalizing Change [8.862970622361747]
Deep models trained on large amounts of data often incorporate implicit biases present during training time.
New data is often expensive and hard to come by in areas such as autonomous driving or medical decision-making.
We present a method based on change penalization that takes a pre-trained model and adapts the weights to mitigate a previously detected bias.
arXiv Detail & Related papers (2024-04-18T16:12:38Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - The Birth of Bias: A case study on the evolution of gender bias in an
English language model [1.6344851071810076]
We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus.
We find that the representation of gender is dynamic and identify different phases during training.
We show that gender information is represented increasingly locally in the input embeddings of the model.
arXiv Detail & Related papers (2022-07-21T00:59:04Z) - Optimising Equal Opportunity Fairness in Model Training [60.0947291284978]
Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias.
We propose two novel training objectives which directly optimise for the widely-used criterion of it equal opportunity, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.
arXiv Detail & Related papers (2022-05-05T01:57:58Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - Adversarial Examples Generation for Reducing Implicit Gender Bias in
Pre-trained Models [2.6329024988388925]
We propose a method to automatically generate implicit gender bias samples at sentence-level and a metric to measure gender bias.
The metric will be used to guide the generation of examples from Pre-trained models. Therefore, those examples could be used to impose attacks on Pre-trained Models.
arXiv Detail & Related papers (2021-10-03T20:22:54Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.