Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model
- URL: http://arxiv.org/abs/2305.11140v1
- Date: Thu, 18 May 2023 17:35:28 GMT
- Title: Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model
- Authors: Chantal Amrhein, Florian Schottmann, Rico Sennrich and Samuel L\"aubli
- Abstract summary: We train a rewriting model for German without the need for elaborate handcrafted rules.
The outputs of this model increased gender-fairness as shown in a human evaluation study.
- Score: 32.21372089380992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language generation models reproduce and often amplify the biases
present in their training data. Previous research explored using
sequence-to-sequence rewriting models to transform biased model outputs (or
original texts) into more gender-fair language by creating pseudo training data
through linguistic rules. However, this approach is not practical for languages
with more complex morphology than English. We hypothesise that creating
training data in the reverse direction, i.e. starting from gender-fair text, is
easier for morphologically complex languages and show that it matches the
performance of state-of-the-art rewriting models for English. To eliminate the
rule-based nature of data creation, we instead propose using machine
translation models to create gender-biased text from real gender-fair text via
round-trip translation. Our approach allows us to train a rewriting model for
German without the need for elaborate handcrafted rules. The outputs of this
model increased gender-fairness as shown in a human evaluation study.
Related papers
- Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Using Artificial French Data to Understand the Emergence of Gender Bias
in Transformer Language Models [5.22145960878624]
This work takes an initial step towards exploring the less researched topic of how neural models discover linguistic properties of words, such as gender, as well as the rules governing their usage.
We propose to use an artificial corpus generated by a PCFG based on French to precisely control the gender distribution in the training data and determine under which conditions a model correctly captures gender information or, on the contrary, appears gender-biased.
arXiv Detail & Related papers (2023-10-24T14:08:37Z) - Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender
Perturbation over Fairytale Texts [87.62403265382734]
Recent studies show that traditional fairytales are rife with harmful gender biases.
This work aims to assess learned biases of language models by evaluating their robustness against gender perturbations.
arXiv Detail & Related papers (2023-10-16T22:25:09Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - The Birth of Bias: A case study on the evolution of gender bias in an
English language model [1.6344851071810076]
We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus.
We find that the representation of gender is dynamic and identify different phases during training.
We show that gender information is represented increasingly locally in the input embeddings of the model.
arXiv Detail & Related papers (2022-07-21T00:59:04Z) - Using Natural Sentences for Understanding Biases in Language Models [10.604991889372092]
We create a prompt dataset with respect to occupations collected from real-world natural sentences in Wikipedia.
We find bias evaluations are very sensitive to the design choices of template prompts.
We propose using natural sentence prompts for systematic evaluations to step away from design choices that could introduce bias in the observations.
arXiv Detail & Related papers (2022-05-12T18:36:33Z) - Training Language Models with Natural Language Feedback [51.36137482891037]
We learn from language feedback on model outputs using a three-step learning algorithm.
In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements.
Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
arXiv Detail & Related papers (2022-04-29T15:06:58Z) - Mitigating Gender Bias in Distilled Language Models via Counterfactual
Role Reversal [74.52580517012832]
Language excel models can be biased in ways including male and female knowledge with genderneutral genders.
We present a novel approach to mitigate gender disparity based on multiple learning role settings.
We observe that models that reduce gender polarity language do not improve fairness or downstream classification.
arXiv Detail & Related papers (2022-03-23T17:34:35Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.