Detecting and Mitigating Indirect Stereotypes in Word Embeddings
- URL: http://arxiv.org/abs/2305.14574v1
- Date: Tue, 23 May 2023 23:23:49 GMT
- Title: Detecting and Mitigating Indirect Stereotypes in Word Embeddings
- Authors: Erin George, Joyce Chew, Deanna Needell
- Abstract summary: Societal biases in the usage of words, including harmful stereotypes, are frequently learned by common word embedding methods.
We propose a novel method called Biased Indirect Relationship Modification (BIRM) to mitigate indirect bias in distributional word embeddings.
- Score: 6.428026202398116
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Societal biases in the usage of words, including harmful stereotypes, are
frequently learned by common word embedding methods. These biases manifest not
only between a word and an explicit marker of its stereotype, but also between
words that share related stereotypes. This latter phenomenon, sometimes called
"indirect bias,'' has resisted prior attempts at debiasing. In this paper, we
propose a novel method called Biased Indirect Relationship Modification (BIRM)
to mitigate indirect bias in distributional word embeddings by modifying biased
relationships between words before embeddings are learned. This is done by
considering how the co-occurrence probability of a given pair of words changes
in the presence of words marking an attribute of bias, and using this to
average out the effect of a bias attribute. To evaluate this method, we perform
a series of common tests and demonstrate that measures of bias in the word
embeddings are reduced in exchange for minor reduction in the semantic quality
of the embeddings. In addition, we conduct novel tests for measuring indirect
stereotypes by extending the Word Embedding Association Test (WEAT) with new
test sets for indirect binary gender stereotypes. With these tests, we
demonstrate the presence of more subtle stereotypes not addressed by previous
work. The proposed method is able to reduce the presence of some of these new
stereotypes, serving as a crucial next step towards non-stereotyped word
embeddings.
Related papers
- Debiasing Sentence Embedders through Contrastive Word Pairs [46.9044612783003]
We explore an approach to remove linear and nonlinear bias information for NLP solutions.
We compare our approach to common debiasing methods on classical bias metrics and on bias metrics which take nonlinear information into account.
arXiv Detail & Related papers (2024-03-27T13:34:59Z) - Looking at the Overlooked: An Analysis on the Word-Overlap Bias in
Natural Language Inference [20.112129592923246]
We focus on an overlooked aspect of the overlap bias in NLI models: the reverse word-overlap bias.
Current NLI models are highly biased towards the non-entailment label on instances with low overlap.
We investigate the reasons for the emergence of the overlap bias and the role of minority examples in its mitigation.
arXiv Detail & Related papers (2022-11-07T21:02:23Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - OSCaR: Orthogonal Subspace Correction and Rectification of Biases in
Word Embeddings [47.721931801603105]
We propose OSCaR, a bias-mitigating method that focuses on disentangling biased associations between concepts instead of removing concepts wholesale.
Our experiments on gender biases show that OSCaR is a well-balanced approach that ensures that semantic information is retained in the embeddings and bias is also effectively mitigated.
arXiv Detail & Related papers (2020-06-30T18:18:13Z) - MDR Cluster-Debias: A Nonlinear WordEmbedding Debiasing Pipeline [3.180013942295509]
Existing methods for debiasing word embeddings often do so only superficially, in that words that are stereotypically associated with a particular gender can still be clustered together in the debiased space.
This paper explores why this residual clustering exists, and how it might be addressed.
We identify two potential reasons for which residual bias exists and develop a new pipeline, MDR Cluster-Debias, to mitigate this bias.
arXiv Detail & Related papers (2020-06-20T20:03:07Z) - Detecting Emergent Intersectional Biases: Contextualized Word Embeddings
Contain a Distribution of Human-like Biases [10.713568409205077]
State-of-the-art neural language models generate dynamic word embeddings dependent on the context in which the word appears.
We introduce the Contextualized Embedding Association Test (CEAT), that can summarize the magnitude of overall bias in neural language models.
We develop two methods, Intersectional Bias Detection (IBD) and Emergent Intersectional Bias Detection (EIBD), to automatically identify the intersectional biases and emergent intersectional biases from static word embeddings.
arXiv Detail & Related papers (2020-06-06T19:49:50Z) - Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation [94.98656228690233]
We propose a technique that purifies the word embeddings against corpus regularities prior to inferring and removing the gender subspace.
Our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.
arXiv Detail & Related papers (2020-05-03T02:33:20Z) - Joint Multiclass Debiasing of Word Embeddings [5.1135133995376085]
We present a joint multiclass debiasing approach capable of debiasing multiple bias dimensions simultaneously.
We show that our concepts can both reduce or even completely eliminate bias, while maintaining meaningful relationships between vectors in word embeddings.
arXiv Detail & Related papers (2020-03-09T22:06:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.