Language Models Get a Gender Makeover: Mitigating Gender Bias with
Few-Shot Data Interventions
- URL: http://arxiv.org/abs/2306.04597v1
- Date: Wed, 7 Jun 2023 16:50:03 GMT
- Title: Language Models Get a Gender Makeover: Mitigating Gender Bias with
Few-Shot Data Interventions
- Authors: Himanshu Thakur, Atishay Jain, Praneetha Vaddamanu, Paul Pu Liang and
Louis-Philippe Morency
- Abstract summary: Societal biases present in pre-trained large language models are a critical issue.
We propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models.
- Score: 50.67412723291881
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Societal biases present in pre-trained large language models are a critical
issue as these models have been shown to propagate biases in countless
downstream applications, rendering them unfair towards specific groups of
people. Since large-scale retraining of these models from scratch is both time
and compute-expensive, a variety of approaches have been previously proposed
that de-bias a pre-trained model. While the majority of current
state-of-the-art debiasing methods focus on changes to the training regime, in
this paper, we propose data intervention strategies as a powerful yet simple
technique to reduce gender bias in pre-trained models. Specifically, we
empirically show that by fine-tuning a pre-trained model on only 10 de-biased
(intervened) training examples, the tendency to favor any gender is
significantly reduced. Since our proposed method only needs a few training
examples, our few-shot debiasing approach is highly feasible and practical.
Through extensive experimentation, we show that our debiasing technique
performs better than competitive state-of-the-art baselines with minimal loss
in language modeling ability.
Related papers
- REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning [18.064064773660174]
We introduce REFINE-LM, a debiasing method that uses reinforcement learning to handle different types of biases without any fine-tuning.
By training a simple model on top of the word probability distribution of a LM, our bias reinforcement learning method enables model debiasing without human annotations.
Experiments conducted on a wide range of models, including several LMs, show that our method significantly reduces stereotypical biases while preserving LMs performance.
arXiv Detail & Related papers (2024-08-18T14:08:31Z) - MABR: A Multilayer Adversarial Bias Removal Approach Without Prior Bias Knowledge [6.208151505901749]
Models trained on real-world data often mirror and exacerbate existing social biases.
We introduce a novel adversarial training strategy that operates independently of prior bias-type knowledge.
Our method effectively reduces social biases without the need for demographic annotations.
arXiv Detail & Related papers (2024-08-10T09:11:01Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Optimising Equal Opportunity Fairness in Model Training [60.0947291284978]
Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias.
We propose two novel training objectives which directly optimise for the widely-used criterion of it equal opportunity, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.
arXiv Detail & Related papers (2022-05-05T01:57:58Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - Adversarial Examples Generation for Reducing Implicit Gender Bias in
Pre-trained Models [2.6329024988388925]
We propose a method to automatically generate implicit gender bias samples at sentence-level and a metric to measure gender bias.
The metric will be used to guide the generation of examples from Pre-trained models. Therefore, those examples could be used to impose attacks on Pre-trained Models.
arXiv Detail & Related papers (2021-10-03T20:22:54Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.