How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text
Classification
- URL: http://arxiv.org/abs/2301.12855v1
- Date: Mon, 30 Jan 2023 13:05:48 GMT
- Title: How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text
Classification
- Authors: Ewoenam Tokpo, Pieter Delobelle, Bettina Berendt and Toon Calders
- Abstract summary: We investigate the effects that some of the major intrinsic gender bias mitigation strategies have on downstream text classification tasks.
We show that each mitigation technique is able to hide the bias from some of the intrinsic bias measures but not all.
We recommend that intrinsic bias mitigation techniques should be combined with other fairness interventions for downstream tasks.
- Score: 12.165921897192902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To mitigate gender bias in contextualized language models, different
intrinsic mitigation strategies have been proposed, alongside many bias
metrics. Considering that the end use of these language models is for
downstream tasks like text classification, it is important to understand how
these intrinsic bias mitigation strategies actually translate to fairness in
downstream tasks and the extent of this. In this work, we design a probe to
investigate the effects that some of the major intrinsic gender bias mitigation
strategies have on downstream text classification tasks. We discover that
instead of resolving gender bias, intrinsic mitigation techniques and metrics
are able to hide it in such a way that significant gender information is
retained in the embeddings. Furthermore, we show that each mitigation technique
is able to hide the bias from some of the intrinsic bias measures but not all,
and each intrinsic bias measure can be fooled by some mitigation techniques,
but not all. We confirm experimentally, that none of the intrinsic mitigation
techniques used without any other fairness intervention is able to consistently
impact extrinsic bias. We recommend that intrinsic bias mitigation techniques
should be combined with other fairness interventions for downstream tasks.
Related papers
- Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation [19.06428714669272]
We systematically test how methods for intrinsic debiasing affect neural machine translation models.
We highlight three challenges and mismatches between the debiasing techniques and their end-goal usage.
arXiv Detail & Related papers (2024-06-02T15:57:29Z) - How to be fair? A study of label and selection bias [3.018638214344819]
It is widely accepted that biased data leads to biased and potentially unfair models.
Several measures for bias in data and model predictions have been proposed, as well as bias mitigation techniques.
Despite the myriad of mitigation techniques developed in the past decade, it is still poorly understood under what circumstances which methods work.
arXiv Detail & Related papers (2024-03-21T10:43:55Z) - Explaining Knock-on Effects of Bias Mitigation [13.46387356280467]
In machine learning systems, bias mitigation approaches aim to make outcomes fairer across privileged and unprivileged groups.
In this paper, we aim to characterise impacted cohorts when mitigation interventions are applied.
We examine a range of bias mitigation strategies that work at various stages of the model life cycle.
We show that all tested mitigation strategies negatively impact a non-trivial fraction of cases, i.e., people who receive unfavourable outcomes solely on account of mitigation efforts.
arXiv Detail & Related papers (2023-12-01T18:40:37Z) - Do Not Harm Protected Groups in Debiasing Language Representation Models [2.9057513016551244]
Language Representation Models (LRMs) trained with real-world data may capture and exacerbate undesired bias.
We examine four debiasing techniques on a real-world text classification task and show that reducing biasing is at the cost of degrading performance for all demographic groups.
arXiv Detail & Related papers (2023-10-27T20:11:38Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous
Pronouns [53.62845317039185]
Bias-measuring datasets play a critical role in detecting biased behavior of language models.
We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation.
We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
arXiv Detail & Related papers (2023-02-11T12:11:03Z) - Testing Occupational Gender Bias in Language Models: Towards Robust Measurement and Zero-Shot Debiasing [98.07536837448293]
Large language models (LLMs) have been shown to exhibit a variety of harmful, human-like biases against various demographics.
We introduce a list of desiderata for robustly measuring biases in generative language models.
We then use this benchmark to test several state-of-the-art open-source LLMs, including Llama, Mistral, and their instruction-tuned versions.
arXiv Detail & Related papers (2022-12-20T22:41:24Z) - MABEL: Attenuating Gender Bias using Textual Entailment Data [20.489427903240017]
We propose MABEL, an intermediate pre-training approach for mitigating gender bias in contextualized representations.
Key to our approach is the use of a contrastive learning objective on counterfactually augmented, gender-balanced entailment pairs.
We show that MABEL outperforms previous task-agnostic debiasing approaches in terms of fairness.
arXiv Detail & Related papers (2022-10-26T18:36:58Z) - Information-Theoretic Bias Reduction via Causal View of Spurious
Correlation [71.9123886505321]
We propose an information-theoretic bias measurement technique through a causal interpretation of spurious correlation.
We present a novel debiasing framework against the algorithmic bias, which incorporates a bias regularization loss.
The proposed bias measurement and debiasing approaches are validated in diverse realistic scenarios.
arXiv Detail & Related papers (2022-01-10T01:19:31Z) - Towards causal benchmarking of bias in face analysis algorithms [54.19499274513654]
We develop an experimental method for measuring algorithmic bias of face analysis algorithms.
Our proposed method is based on generating synthetic transects'' of matched sample images.
We validate our method by comparing it to a study that employs the traditional observational method for analyzing bias in gender classification algorithms.
arXiv Detail & Related papers (2020-07-13T17:10:34Z) - Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation [94.98656228690233]
We propose a technique that purifies the word embeddings against corpus regularities prior to inferring and removing the gender subspace.
Our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.
arXiv Detail & Related papers (2020-05-03T02:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.