Related papers: Detecting and Mitigating Indirect Stereotypes in Word Embeddings

Detecting and Mitigating Indirect Stereotypes in Word Embeddings

URL: http://arxiv.org/abs/2305.14574v1
Date: Tue, 23 May 2023 23:23:49 GMT
Title: Detecting and Mitigating Indirect Stereotypes in Word Embeddings
Authors: Erin George, Joyce Chew, Deanna Needell
Abstract summary: Societal biases in the usage of words, including harmful stereotypes, are frequently learned by common word embedding methods. We propose a novel method called Biased Indirect Relationship Modification (BIRM) to mitigate indirect bias in distributional word embeddings.
Score: 6.428026202398116
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Societal biases in the usage of words, including harmful stereotypes, are frequently learned by common word embedding methods. These biases manifest not only between a word and an explicit marker of its stereotype, but also between words that share related stereotypes. This latter phenomenon, sometimes called "indirect bias,'' has resisted prior attempts at debiasing. In this paper, we propose a novel method called Biased Indirect Relationship Modification (BIRM) to mitigate indirect bias in distributional word embeddings by modifying biased relationships between words before embeddings are learned. This is done by considering how the co-occurrence probability of a given pair of words changes in the presence of words marking an attribute of bias, and using this to average out the effect of a bias attribute. To evaluate this method, we perform a series of common tests and demonstrate that measures of bias in the word embeddings are reduced in exchange for minor reduction in the semantic quality of the embeddings. In addition, we conduct novel tests for measuring indirect stereotypes by extending the Word Embedding Association Test (WEAT) with new test sets for indirect binary gender stereotypes. With these tests, we demonstrate the presence of more subtle stereotypes not addressed by previous work. The proposed method is able to reduce the presence of some of these new stereotypes, serving as a crucial next step towards non-stereotyped word embeddings.

Related papers

Mitigating Gender Bias in Contextual Word Embeddings [1.208453901299241]
We propose a novel objective function for Lipstick(Masked-Language Modeling) which largely mitigates the gender bias in contextual embeddings. We also propose new methods for debiasing static embeddings and provide empirical proof via extensive analysis and experiments.
arXiv Detail & Related papers (2024-11-18T21:36:44Z)
Debiasing Sentence Embedders through Contrastive Word Pairs [46.9044612783003]
We explore an approach to remove linear and nonlinear bias information for NLP solutions. We compare our approach to common debiasing methods on classical bias metrics and on bias metrics which take nonlinear information into account.
arXiv Detail & Related papers (2024-03-27T13:34:59Z)
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference [20.112129592923246]
We focus on an overlooked aspect of the overlap bias in NLI models: the reverse word-overlap bias. Current NLI models are highly biased towards the non-entailment label on instances with low overlap. We investigate the reasons for the emergence of the overlap bias and the role of minority examples in its mitigation.
arXiv Detail & Related papers (2022-11-07T21:02:23Z)
The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings. We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z)
Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race. Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables. This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
Mitigating Biases in Toxic Language Detection through Invariant Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection. We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns. Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z)
OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings [47.721931801603105]
We propose OSCaR, a bias-mitigating method that focuses on disentangling biased associations between concepts instead of removing concepts wholesale. Our experiments on gender biases show that OSCaR is a well-balanced approach that ensures that semantic information is retained in the embeddings and bias is also effectively mitigated.
arXiv Detail & Related papers (2020-06-30T18:18:13Z)
Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases [10.713568409205077]
State-of-the-art neural language models generate dynamic word embeddings dependent on the context in which the word appears. We introduce the Contextualized Embedding Association Test (CEAT), that can summarize the magnitude of overall bias in neural language models. We develop two methods, Intersectional Bias Detection (IBD) and Emergent Intersectional Bias Detection (EIBD), to automatically identify the intersectional biases and emergent intersectional biases from static word embeddings.
arXiv Detail & Related papers (2020-06-06T19:49:50Z)
Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation [94.98656228690233]
We propose a technique that purifies the word embeddings against corpus regularities prior to inferring and removing the gender subspace. Our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.
arXiv Detail & Related papers (2020-05-03T02:33:20Z)
Joint Multiclass Debiasing of Word Embeddings [5.1135133995376085]
We present a joint multiclass debiasing approach capable of debiasing multiple bias dimensions simultaneously. We show that our concepts can both reduce or even completely eliminate bias, while maintaining meaningful relationships between vectors in word embeddings.
arXiv Detail & Related papers (2020-03-09T22:06:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.