Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous
Pronouns
- URL: http://arxiv.org/abs/2302.05674v1
- Date: Sat, 11 Feb 2023 12:11:03 GMT
- Title: Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous
Pronouns
- Authors: Zhongbin Xie, Vid Kocijan, Thomas Lukasiewicz, Oana-Maria Camburu
- Abstract summary: Bias-measuring datasets play a critical role in detecting biased behavior of language models.
We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation.
We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
- Score: 53.62845317039185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bias-measuring datasets play a critical role in detecting biased behavior of
language models and in evaluating progress of bias mitigation methods. In this
work, we focus on evaluating gender bias through coreference resolution, where
previous datasets are either hand-crafted or fail to reliably measure an
explicitly defined bias. To overcome these shortcomings, we propose a novel
method to collect diverse, natural, and minimally distant text pairs via
counterfactual generation, and construct Counter-GAP, an annotated dataset
consisting of 4008 instances grouped into 1002 quadruples. We further identify
a bias cancellation problem in previous group-level metrics on Counter-GAP, and
propose to use the difference between inconsistency across genders and within
genders to measure bias at a quadruple level. Our results show that four
pre-trained language models are significantly more inconsistent across
different gender groups than within each group, and that a name-based
counterfactual data augmentation method is more effective to mitigate such bias
than an anonymization-based method.
Related papers
- Gender Bias Mitigation for Bangla Classification Tasks [2.6285986998314783]
We investigate gender bias in Bangla pretrained language models.
By altering names and gender-specific terms, we ensured these datasets were suitable for detecting and mitigating gender bias.
arXiv Detail & Related papers (2024-11-16T00:04:45Z) - The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.
Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.
We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels [38.1620443730172]
Discriminatory gender biases have been found in Pre-trained Language Models (PLMs) for multiple languages.
We propose a bias evaluation method for PLMs, called NLI-CoAL, which considers all the three labels of Natural Language Inference.
We create datasets in English, Japanese, and Chinese, and successfully validate our bias measure across multiple languages.
arXiv Detail & Related papers (2023-09-18T12:02:21Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - MABEL: Attenuating Gender Bias using Textual Entailment Data [20.489427903240017]
We propose MABEL, an intermediate pre-training approach for mitigating gender bias in contextualized representations.
Key to our approach is the use of a contrastive learning objective on counterfactually augmented, gender-balanced entailment pairs.
We show that MABEL outperforms previous task-agnostic debiasing approaches in terms of fairness.
arXiv Detail & Related papers (2022-10-26T18:36:58Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - The Gap on GAP: Tackling the Problem of Differing Data Distributions in
Bias-Measuring Datasets [58.53269361115974]
Diagnostic datasets that can detect biased models are an important prerequisite for bias reduction within natural language processing.
undesired patterns in the collected data can make such tests incorrect.
We introduce a theoretically grounded method for weighting test samples to cope with such patterns in the test data.
arXiv Detail & Related papers (2020-11-03T16:50:13Z) - Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation [94.98656228690233]
We propose a technique that purifies the word embeddings against corpus regularities prior to inferring and removing the gender subspace.
Our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.
arXiv Detail & Related papers (2020-05-03T02:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.