Adversarial Examples Generation for Reducing Implicit Gender Bias in
Pre-trained Models
- URL: http://arxiv.org/abs/2110.01094v1
- Date: Sun, 3 Oct 2021 20:22:54 GMT
- Title: Adversarial Examples Generation for Reducing Implicit Gender Bias in
Pre-trained Models
- Authors: Wenqian Ye, Fei Xu, Yaojia Huang, Cassie Huang, Ji A
- Abstract summary: We propose a method to automatically generate implicit gender bias samples at sentence-level and a metric to measure gender bias.
The metric will be used to guide the generation of examples from Pre-trained models. Therefore, those examples could be used to impose attacks on Pre-trained Models.
- Score: 2.6329024988388925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the last few years, Contextualized Pre-trained Neural Language Models,
such as BERT, GPT, have shown significant gains in various NLP tasks. To
enhance the robustness of existing pre-trained models, one way is adversarial
examples generation and evaluation for conducting data augmentation or
adversarial learning. In the meanwhile, gender bias embedded in the models
seems to be a serious problem in practical applications. Many researches have
covered the gender bias produced by word-level information(e.g.
gender-stereotypical occupations), while few researchers have investigated the
sentence-level cases and implicit cases.
In this paper, we proposed a method to automatically generate implicit gender
bias samples at sentence-level and a metric to measure gender bias. Samples
generated by our method will be evaluated in terms of accuracy. The metric will
be used to guide the generation of examples from Pre-trained models. Therefore,
those examples could be used to impose attacks on Pre-trained Models. Finally,
we discussed the evaluation efficacy of our generated examples on reducing
gender bias for future research.
Related papers
- DiFair: A Benchmark for Disentangled Assessment of Gender Knowledge and
Bias [13.928591341824248]
Debiasing techniques have been proposed to mitigate the gender bias that is prevalent in pretrained language models.
These are often evaluated on datasets that check the extent to which the model is gender-neutral in its predictions.
This evaluation protocol overlooks the possible adverse impact of bias mitigation on useful gender knowledge.
arXiv Detail & Related papers (2023-10-22T15:27:16Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - Language Models Get a Gender Makeover: Mitigating Gender Bias with
Few-Shot Data Interventions [50.67412723291881]
Societal biases present in pre-trained large language models are a critical issue.
We propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models.
arXiv Detail & Related papers (2023-06-07T16:50:03Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Evaluating Gender Bias in Natural Language Inference [5.034017602990175]
We propose an evaluation methodology to measure gender bias in natural language understanding through inference.
We use our challenge task to investigate state-of-the-art NLI models on the presence of gender stereotypes using occupations.
Our findings suggest that three models trained on MNLI and SNLI datasets are significantly prone to gender-induced prediction errors.
arXiv Detail & Related papers (2021-05-12T09:41:51Z) - Impact of Gender Debiased Word Embeddings in Language Modeling [0.0]
Gender, race and social biases have been detected as evident examples of unfairness in applications of Natural Language Processing.
Recent studies have shown that the human-generated data used in training is an apparent factor of getting biases.
Current algorithms have also been proven to amplify biases from data.
arXiv Detail & Related papers (2021-05-03T14:45:10Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.