An Empirical Survey of the Effectiveness of Debiasing Techniques for
Pre-Trained Language Models
- URL: http://arxiv.org/abs/2110.08527v1
- Date: Sat, 16 Oct 2021 09:40:30 GMT
- Title: An Empirical Survey of the Effectiveness of Debiasing Techniques for
Pre-Trained Language Models
- Authors: Nicholas Meade, Elinor Poole-Dayan, Siva Reddy
- Abstract summary: Recent work has shown that pre-trained language models capture social biases from the text corpora they are trained on.
Five recently proposed debiasing techniques: Counterfactual Data Augmentation, Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias.
We quantify the effectiveness of each technique using three different bias benchmarks while also measuring the impact of these techniques on a model's language modeling ability.
- Score: 4.937002982255573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has shown that pre-trained language models capture social biases
from the text corpora they are trained on. This has attracted attention to
developing techniques that mitigate such biases. In this work, we perform a
empirical survey of five recently proposed debiasing techniques: Counterfactual
Data Augmentation (CDA), Dropout, Iterative Nullspace Projection, Self-Debias,
and SentenceDebias. We quantify the effectiveness of each technique using three
different bias benchmarks while also measuring the impact of these techniques
on a model's language modeling ability, as well as its performance on
downstream NLU tasks. We experimentally find that: (1) CDA and Self-Debias are
the strongest of the debiasing techniques, obtaining improved scores on most of
the bias benchmarks (2) Current debiasing techniques do not generalize well
beyond gender bias; And (3) improvements on bias benchmarks such as StereoSet
and CrowS-Pairs by using debiasing strategies are usually accompanied by a
decrease in language modeling ability, making it difficult to determine whether
the bias mitigation is effective.
Related papers
- Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation [19.06428714669272]
We systematically test how methods for intrinsic debiasing affect neural machine translation models.
We highlight three challenges and mismatches between the debiasing techniques and their end-goal usage.
arXiv Detail & Related papers (2024-06-02T15:57:29Z) - Projective Methods for Mitigating Gender Bias in Pre-trained Language Models [10.418595661963062]
Projective methods are fast to implement, use a small number of saved parameters, and make no updates to the existing model parameters.
We find that projective methods can be effective at both intrinsic bias and downstream bias mitigation, but that the two outcomes are not necessarily correlated.
arXiv Detail & Related papers (2024-03-27T17:49:31Z) - How to be fair? A study of label and selection bias [3.018638214344819]
It is widely accepted that biased data leads to biased and potentially unfair models.
Several measures for bias in data and model predictions have been proposed, as well as bias mitigation techniques.
Despite the myriad of mitigation techniques developed in the past decade, it is still poorly understood under what circumstances which methods work.
arXiv Detail & Related papers (2024-03-21T10:43:55Z) - Improving Bias Mitigation through Bias Experts in Natural Language
Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model.
Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z) - Language Models Get a Gender Makeover: Mitigating Gender Bias with
Few-Shot Data Interventions [50.67412723291881]
Societal biases present in pre-trained large language models are a critical issue.
We propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models.
arXiv Detail & Related papers (2023-06-07T16:50:03Z) - An Empirical Analysis of Parameter-Efficient Methods for Debiasing
Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance.
We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective.
We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z) - Feature-Level Debiased Natural Language Understanding [86.8751772146264]
Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets.
We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias.
DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
arXiv Detail & Related papers (2022-12-11T06:16:14Z) - Optimising Equal Opportunity Fairness in Model Training [60.0947291284978]
Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias.
We propose two novel training objectives which directly optimise for the widely-used criterion of it equal opportunity, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.
arXiv Detail & Related papers (2022-05-05T01:57:58Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of
Conversational Language Models [37.98671828283487]
Text representation models are prone to exhibit a range of societal biases.
Recent work has predominantly focused on measuring and mitigating bias in pretrained language models.
We present RedditBias, the first conversational data set grounded in the actual human conversations from Reddit.
arXiv Detail & Related papers (2021-06-07T11:22:39Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.