Related papers: The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

URL: http://arxiv.org/abs/2301.00792v1
Date: Mon, 2 Jan 2023 18:27:10 GMT
Title: The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings
Authors: Francisco Valentini, Germ\'an Rosati, Diego Fernandez Slezak, Edgar Altszyler
Abstract summary: We study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.

Related papers

Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets. Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z)
Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous Pronouns [53.62845317039185]
Bias-measuring datasets play a critical role in detecting biased behavior of language models. We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation. We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
arXiv Detail & Related papers (2023-02-11T12:11:03Z)
Investigating the Frequency Distortion of Word Embeddings and Its Impact on Bias Metrics [2.1374208474242815]
We systematically study the association between frequency and semantic similarity in several static word embeddings. We find that Skip-gram, GloVe and FastText embeddings tend to produce higher semantic similarity between high-frequency words than between other frequency combinations.
arXiv Detail & Related papers (2022-11-15T15:11:06Z)
Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics [3.4048739113355215]
We provide a comprehensive analysis of group-based biases in widely-used static English word embeddings trained on internet corpora. Using the Single-Category Word Embedding Association Test, we demonstrate the widespread prevalence of gender biases. We find that, of the 1,000 most frequent words in the vocabulary, 77% are more associated with men than women.
arXiv Detail & Related papers (2022-06-07T15:35:10Z)
Identifying and Mitigating Gender Bias in Hyperbolic Word Embeddings [34.378806636170616]
We extend the study of gender bias to the recently popularized hyperbolic word embeddings. We propose gyrocosine bias, a novel measure for quantifying gender bias in hyperbolic word representations. Experiments on a suit of evaluation tests show that Poincar'e Gender Debias (PGD) effectively reduces bias while adding a minimal semantic offset.
arXiv Detail & Related papers (2021-09-28T14:43:37Z)
Robustness and Reliability of Gender Bias Assessment in Word Embeddings: The Role of Base Pairs [23.574442657224008]
It has been shown that word embeddings can exhibit gender bias, and various methods have been proposed to quantify this. Previous work has leveraged gender word pairs to measure bias and extract biased analogies. We show that the reliance on these gendered pairs has strong limitations. In particular, the well-known analogy "man is to computer-programmer as woman is to homemaker" is due to word similarity rather than societal bias.
arXiv Detail & Related papers (2020-10-06T16:09:05Z)
Gender Stereotype Reinforcement: Measuring the Gender Bias Conveyed by Ranking Algorithms [68.85295025020942]
We propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a Search Engines to support gender stereotypes. GSR is the first specifically tailored measure for Information Retrieval, capable of quantifying representational harms.
arXiv Detail & Related papers (2020-09-02T20:45:04Z)
Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased Proximities in Word Embeddings [37.65897382453336]
Existing post-processing methods for debiasing word embeddings are unable to mitigate gender bias hidden in the spatial arrangement of word vectors. We propose RAN-Debias, a novel gender debiasing methodology which eliminates the bias present in a word vector but also alters the spatial distribution of its neighbouring vectors. We also propose a new bias evaluation metric - Gender-based Illicit Proximity Estimate (GIPE)
arXiv Detail & Related papers (2020-06-02T20:50:43Z)
Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation [94.98656228690233]
We propose a technique that purifies the word embeddings against corpus regularities prior to inferring and removing the gender subspace. Our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.
arXiv Detail & Related papers (2020-05-03T02:33:20Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.