MABEL: Attenuating Gender Bias using Textual Entailment Data
- URL: http://arxiv.org/abs/2210.14975v1
- Date: Wed, 26 Oct 2022 18:36:58 GMT
- Title: MABEL: Attenuating Gender Bias using Textual Entailment Data
- Authors: Jacqueline He, Mengzhou Xia, Christiane Fellbaum, Danqi Chen
- Abstract summary: We propose MABEL, an intermediate pre-training approach for mitigating gender bias in contextualized representations.
Key to our approach is the use of a contrastive learning objective on counterfactually augmented, gender-balanced entailment pairs.
We show that MABEL outperforms previous task-agnostic debiasing approaches in terms of fairness.
- Score: 20.489427903240017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models encode undesirable social biases, which are
further exacerbated in downstream use. To this end, we propose MABEL (a Method
for Attenuating Gender Bias using Entailment Labels), an intermediate
pre-training approach for mitigating gender bias in contextualized
representations. Key to our approach is the use of a contrastive learning
objective on counterfactually augmented, gender-balanced entailment pairs from
natural language inference (NLI) datasets. We also introduce an alignment
regularizer that pulls identical entailment pairs along opposite gender
directions closer. We extensively evaluate our approach on intrinsic and
extrinsic metrics, and show that MABEL outperforms previous task-agnostic
debiasing approaches in terms of fairness. It also preserves task performance
after fine-tuning on downstream tasks. Together, these findings demonstrate the
suitability of NLI data as an effective means of bias mitigation, as opposed to
only using unlabeled sentences in the literature. Finally, we identify that
existing approaches often use evaluation settings that are insufficient or
inconsistent. We make an effort to reproduce and compare previous methods, and
call for unifying the evaluation settings across gender debiasing methods for
better future comparison.
Related papers
- Gender Bias Mitigation for Bangla Classification Tasks [2.6285986998314783]
We investigate gender bias in Bangla pretrained language models.
By altering names and gender-specific terms, we ensured these datasets were suitable for detecting and mitigating gender bias.
arXiv Detail & Related papers (2024-11-16T00:04:45Z) - Unlabeled Debiasing in Downstream Tasks via Class-wise Low Variance Regularization [13.773597081543185]
We introduce a novel debiasing regularization technique based on the class-wise variance of embeddings.
Our method does not require attribute labels and targets any attribute, thus addressing the shortcomings of existing debiasing methods.
arXiv Detail & Related papers (2024-09-29T03:56:50Z) - Projective Methods for Mitigating Gender Bias in Pre-trained Language Models [10.418595661963062]
Projective methods are fast to implement, use a small number of saved parameters, and make no updates to the existing model parameters.
We find that projective methods can be effective at both intrinsic bias and downstream bias mitigation, but that the two outcomes are not necessarily correlated.
arXiv Detail & Related papers (2024-03-27T17:49:31Z) - DiFair: A Benchmark for Disentangled Assessment of Gender Knowledge and
Bias [13.928591341824248]
Debiasing techniques have been proposed to mitigate the gender bias that is prevalent in pretrained language models.
These are often evaluated on datasets that check the extent to which the model is gender-neutral in its predictions.
This evaluation protocol overlooks the possible adverse impact of bias mitigation on useful gender knowledge.
arXiv Detail & Related papers (2023-10-22T15:27:16Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic
Contrast Sets [52.77024349608834]
Vision-language models can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet.
COCO Captions is the most commonly used dataset for evaluating bias between background context and the gender of people in-situ.
We propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets.
arXiv Detail & Related papers (2023-05-24T17:59:18Z) - Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous
Pronouns [53.62845317039185]
Bias-measuring datasets play a critical role in detecting biased behavior of language models.
We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation.
We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
arXiv Detail & Related papers (2023-02-11T12:11:03Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - The Gap on GAP: Tackling the Problem of Differing Data Distributions in
Bias-Measuring Datasets [58.53269361115974]
Diagnostic datasets that can detect biased models are an important prerequisite for bias reduction within natural language processing.
undesired patterns in the collected data can make such tests incorrect.
We introduce a theoretically grounded method for weighting test samples to cope with such patterns in the test data.
arXiv Detail & Related papers (2020-11-03T16:50:13Z) - Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation [94.98656228690233]
We propose a technique that purifies the word embeddings against corpus regularities prior to inferring and removing the gender subspace.
Our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.
arXiv Detail & Related papers (2020-05-03T02:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.