Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation
- URL: http://arxiv.org/abs/2009.09435v4
- Date: Wed, 22 May 2024 14:51:24 GMT
- Title: Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation
- Authors: Francisco Vargas, Ryan Cotterell,
- Abstract summary: Bolukbasi et al. present one of the first gender bias mitigation techniques for word representations.
We generalize their method to a kernelized, nonlinear version.
We analyze empirically whether the bias subspace is actually linear.
- Score: 57.292988892028134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bolukbasi et al. (2016) presents one of the first gender bias mitigation techniques for word representations. Their method takes pre-trained word representations as input and attempts to isolate a linear subspace that captures most of the gender bias in the representations. As judged by an analogical evaluation task, their method virtually eliminates gender bias in the representations. However, an implicit and untested assumption of their method is that the bias subspace is actually linear. In this work, we generalize their method to a kernelized, nonlinear version. We take inspiration from kernel principal component analysis and derive a nonlinear bias isolation technique. We discuss and overcome some of the practical drawbacks of our method for non-linear gender bias mitigation in word representations and analyze empirically whether the bias subspace is actually linear. Our analysis shows that gender bias is in fact well captured by a linear subspace, justifying the assumption of Bolukbasi et al. (2016).
Related papers
- Disclosure and Mitigation of Gender Bias in LLMs [64.79319733514266]
Large Language Models (LLMs) can generate biased responses.
We propose an indirect probing framework based on conditional generation.
We explore three distinct strategies to disclose explicit and implicit gender bias in LLMs.
arXiv Detail & Related papers (2024-02-17T04:48:55Z) - Linear Adversarial Concept Erasure [108.37226654006153]
We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept.
We show that the method is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
arXiv Detail & Related papers (2022-01-28T13:00:17Z) - Robustness and Reliability of Gender Bias Assessment in Word Embeddings:
The Role of Base Pairs [23.574442657224008]
It has been shown that word embeddings can exhibit gender bias, and various methods have been proposed to quantify this.
Previous work has leveraged gender word pairs to measure bias and extract biased analogies.
We show that the reliance on these gendered pairs has strong limitations.
In particular, the well-known analogy "man is to computer-programmer as woman is to homemaker" is due to word similarity rather than societal bias.
arXiv Detail & Related papers (2020-10-06T16:09:05Z) - MDR Cluster-Debias: A Nonlinear WordEmbedding Debiasing Pipeline [3.180013942295509]
Existing methods for debiasing word embeddings often do so only superficially, in that words that are stereotypically associated with a particular gender can still be clustered together in the debiased space.
This paper explores why this residual clustering exists, and how it might be addressed.
We identify two potential reasons for which residual bias exists and develop a new pipeline, MDR Cluster-Debias, to mitigate this bias.
arXiv Detail & Related papers (2020-06-20T20:03:07Z) - Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased
Proximities in Word Embeddings [37.65897382453336]
Existing post-processing methods for debiasing word embeddings are unable to mitigate gender bias hidden in the spatial arrangement of word vectors.
We propose RAN-Debias, a novel gender debiasing methodology which eliminates the bias present in a word vector but also alters the spatial distribution of its neighbouring vectors.
We also propose a new bias evaluation metric - Gender-based Illicit Proximity Estimate (GIPE)
arXiv Detail & Related papers (2020-06-02T20:50:43Z) - Mitigating Gender Bias Amplification in Distribution by Posterior
Regularization [75.3529537096899]
We investigate the gender bias amplification issue from the distribution perspective.
We propose a bias mitigation approach based on posterior regularization.
Our study sheds the light on understanding the bias amplification.
arXiv Detail & Related papers (2020-05-13T11:07:10Z) - Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation [94.98656228690233]
We propose a technique that purifies the word embeddings against corpus regularities prior to inferring and removing the gender subspace.
Our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.
arXiv Detail & Related papers (2020-05-03T02:33:20Z) - Null It Out: Guarding Protected Attributes by Iterative Nullspace
Projection [51.041763676948705]
Iterative Null-space Projection (INLP) is a novel method for removing information from neural representations.
We show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
arXiv Detail & Related papers (2020-04-16T14:02:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.