The Birth of Bias: A case study on the evolution of gender bias in an
English language model
- URL: http://arxiv.org/abs/2207.10245v1
- Date: Thu, 21 Jul 2022 00:59:04 GMT
- Title: The Birth of Bias: A case study on the evolution of gender bias in an
English language model
- Authors: Oskar van der Wal, Jaap Jumelet, Katrin Schulz, Willem Zuidema
- Abstract summary: We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus.
We find that the representation of gender is dynamic and identify different phases during training.
We show that gender information is represented increasingly locally in the input embeddings of the model.
- Score: 1.6344851071810076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting and mitigating harmful biases in modern language models are widely
recognized as crucial, open problems. In this paper, we take a step back and
investigate how language models come to be biased in the first place. We use a
relatively small language model, using the LSTM architecture trained on an
English Wikipedia corpus. With full access to the data and to the model
parameters as they change during every step while training, we can map in
detail how the representation of gender develops, what patterns in the dataset
drive this, and how the model's internal state relates to the bias in a
downstream task (semantic textual similarity). We find that the representation
of gender is dynamic and identify different phases during training.
Furthermore, we show that gender information is represented increasingly
locally in the input embeddings of the model and that, as a consequence,
debiasing these can be effective in reducing the downstream bias. Monitoring
the training dynamics, allows us to detect an asymmetry in how the female and
male gender are represented in the input embeddings. This is important, as it
may cause naive mitigation strategies to introduce new undesirable biases. We
discuss the relevance of the findings for mitigation strategies more generally
and the prospects of generalizing our methods to larger language models, the
Transformer architecture, other languages and other undesirable biases.
Related papers
- Understanding the Interplay of Scale, Data, and Bias in Language Models: A Case Study with BERT [4.807994469764776]
We study the influence of model scale and pre-training data on a language model's learnt social biases.
Our experiments show that pre-training data substantially influences how upstream biases evolve with model scale.
We shed light on the complex interplay of data and model scale, and investigate how it translates to concrete biases.
arXiv Detail & Related papers (2024-07-25T23:09:33Z) - Less can be more: representational vs. stereotypical gender bias in facial expression recognition [3.9698529891342207]
Machine learning models can inherit biases from their training data, leading to discriminatory or inaccurate predictions.
This paper investigates the propagation of demographic biases from datasets into machine learning models.
We focus on the gender demographic component, analyzing two types of bias: representational and stereotypical.
arXiv Detail & Related papers (2024-06-25T09:26:49Z) - Locating and Mitigating Gender Bias in Large Language Models [40.78150878350479]
Large language models (LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences.
This process can inadvertently lead to these models acquiring biases and prevalent stereotypes in society.
We propose the LSDM (Least Square Debias Method), a knowledge-editing based method for mitigating gender bias in occupational pronouns.
arXiv Detail & Related papers (2024-03-21T13:57:43Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Language Models Get a Gender Makeover: Mitigating Gender Bias with
Few-Shot Data Interventions [50.67412723291881]
Societal biases present in pre-trained large language models are a critical issue.
We propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models.
arXiv Detail & Related papers (2023-06-07T16:50:03Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Exploring Gender Bias in Retrieval Models [2.594412743115663]
Mitigating gender bias in information retrieval is important to avoid propagating stereotypes.
We employ a dataset consisting of two components: (1) relevance of a document to a query and (2) "gender" of a document.
We show that pre-trained models for IR do not perform well in zero-shot retrieval tasks when full fine-tuning of a large pre-trained BERT encoder is performed.
We also illustrate that pre-trained models have gender biases that result in retrieved articles tending to be more often male than female.
arXiv Detail & Related papers (2022-08-02T21:12:05Z) - Examining Covert Gender Bias: A Case Study in Turkish and English
Machine Translation Models [7.648784748888186]
We examine cases of both overt and covert gender bias in Machine Translation models.
Specifically, we introduce a method to investigate asymmetrical gender markings.
We also assess bias in the attribution of personhood and examine occupational and personality stereotypes.
arXiv Detail & Related papers (2021-08-23T19:25:56Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.