Evaluating Bias In Dutch Word Embeddings
- URL: http://arxiv.org/abs/2011.00244v2
- Date: Tue, 3 Nov 2020 22:34:44 GMT
- Title: Evaluating Bias In Dutch Word Embeddings
- Authors: Rodrigo Alejandro Ch\'avez Mulsa and Gerasimos Spanakis
- Abstract summary: We implement the Word Embeddings Association Test (WEAT), Clustering and Sentence Embeddings Association Test (SEAT) methods to quantify the gender bias in Dutch word embeddings.
We analyze the effect of the debiasing techniques on downstream tasks which show a negligible impact on traditional embeddings and a 2% decrease in performance in contextualized embeddings.
- Score: 2.5567566997688043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research in Natural Language Processing has revealed that word
embeddings can encode social biases present in the training data which can
affect minorities in real world applications. This paper explores the gender
bias implicit in Dutch embeddings while investigating whether English language
based approaches can also be used in Dutch. We implement the Word Embeddings
Association Test (WEAT), Clustering and Sentence Embeddings Association Test
(SEAT) methods to quantify the gender bias in Dutch word embeddings, then we
proceed to reduce the bias with Hard-Debias and Sent-Debias mitigation methods
and finally we evaluate the performance of the debiased embeddings in
downstream tasks. The results suggest that, among others, gender bias is
present in traditional and contextualized Dutch word embeddings. We highlight
how techniques used to measure and reduce bias created for English can be used
in Dutch embeddings by adequately translating the data and taking into account
the unique characteristics of the language. Furthermore, we analyze the effect
of the debiasing techniques on downstream tasks which show a negligible impact
on traditional embeddings and a 2% decrease in performance in contextualized
embeddings. Finally, we release the translated Dutch datasets to the public
along with the traditional embeddings with mitigated bias.
Related papers
- Mitigating Gender Bias in Contextual Word Embeddings [1.208453901299241]
We propose a novel objective function for Lipstick(Masked-Language Modeling) which largely mitigates the gender bias in contextual embeddings.
We also propose new methods for debiasing static embeddings and provide empirical proof via extensive analysis and experiments.
arXiv Detail & Related papers (2024-11-18T21:36:44Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - MABEL: Attenuating Gender Bias using Textual Entailment Data [20.489427903240017]
We propose MABEL, an intermediate pre-training approach for mitigating gender bias in contextualized representations.
Key to our approach is the use of a contrastive learning objective on counterfactually augmented, gender-balanced entailment pairs.
We show that MABEL outperforms previous task-agnostic debiasing approaches in terms of fairness.
arXiv Detail & Related papers (2022-10-26T18:36:58Z) - Social Biases in Automatic Evaluation Metrics for NLG [53.76118154594404]
We propose an evaluation method based on Word Embeddings Association Test (WEAT) and Sentence Embeddings Association Test (SEAT) to quantify social biases in evaluation metrics.
We construct gender-swapped meta-evaluation datasets to explore the potential impact of gender bias in image caption and text summarization tasks.
arXiv Detail & Related papers (2022-10-17T08:55:26Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender
Bias [12.4543414590979]
Contextualized word embeddings have been replacing standard embeddings in NLP systems.
We measure gender bias by studying associations between gender-denoting target words and names of professions in English and German.
We show that our method of measuring bias is appropriate for languages with a rich and gender-marking, such as German.
arXiv Detail & Related papers (2020-10-27T18:06:09Z) - Fair Embedding Engine: A Library for Analyzing and Mitigating Gender
Bias in Word Embeddings [16.49645205111334]
Non-contextual word embedding models have been shown to inherit human-like stereotypical biases of gender, race and religion from the training corpora.
This paper describes Fair Embedding Engine (FEE), a library for analysing and mitigating gender bias in word embeddings.
arXiv Detail & Related papers (2020-10-25T17:31:12Z) - Towards Debiasing Sentence Representations [109.70181221796469]
We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
arXiv Detail & Related papers (2020-07-16T04:22:30Z) - Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation [94.98656228690233]
We propose a technique that purifies the word embeddings against corpus regularities prior to inferring and removing the gender subspace.
Our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.
arXiv Detail & Related papers (2020-05-03T02:33:20Z) - Joint Multiclass Debiasing of Word Embeddings [5.1135133995376085]
We present a joint multiclass debiasing approach capable of debiasing multiple bias dimensions simultaneously.
We show that our concepts can both reduce or even completely eliminate bias, while maintaining meaningful relationships between vectors in word embeddings.
arXiv Detail & Related papers (2020-03-09T22:06:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.