Debiasing Methods in Natural Language Understanding Make Bias More
Accessible
- URL: http://arxiv.org/abs/2109.04095v1
- Date: Thu, 9 Sep 2021 08:28:22 GMT
- Title: Debiasing Methods in Natural Language Understanding Make Bias More
Accessible
- Authors: Michael Mendelson and Yonatan Belinkov
- Abstract summary: Recent debiasing methods in natural language understanding (NLU) improve performance on such datasets by pressuring models into making unbiased predictions.
We propose a general probing-based framework that allows for post-hoc interpretation of biases in language models.
We show that, counter-intuitively, the more a language model is pushed towards a debiased regime, the more bias is actually encoded in its inner representations.
- Score: 28.877572447481683
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model robustness to bias is often determined by the generalization on
carefully designed out-of-distribution datasets. Recent debiasing methods in
natural language understanding (NLU) improve performance on such datasets by
pressuring models into making unbiased predictions. An underlying assumption
behind such methods is that this also leads to the discovery of more robust
features in the model's inner representations. We propose a general
probing-based framework that allows for post-hoc interpretation of biases in
language models, and use an information-theoretic approach to measure the
extractability of certain biases from the model's representations. We
experiment with several NLU datasets and known biases, and show that,
counter-intuitively, the more a language model is pushed towards a debiased
regime, the more bias is actually encoded in its inner representations.
Related papers
- Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models [1.787433808079955]
Large language models (LLMs) have been observed to perpetuate unwanted biases in training data.
In this paper, we mitigate bias by leveraging small biased and anti-biased expert models to obtain a debiasing signal.
Experiments on mitigating gender, race, and religion biases show a reduction in bias on several local and global bias metrics.
arXiv Detail & Related papers (2024-12-02T16:56:08Z) - Looking at Model Debiasing through the Lens of Anomaly Detection [11.113718994341733]
Deep neural networks are sensitive to bias in the data.
In this work, we show the importance of accurately predicting the bias-conflicting and bias-aligned samples.
We propose a new bias identification method based on anomaly detection.
arXiv Detail & Related papers (2024-07-24T17:30:21Z) - IBADR: an Iterative Bias-Aware Dataset Refinement Framework for
Debiasing NLU models [52.03761198830643]
We propose IBADR, an Iterative Bias-Aware dataset Refinement framework.
We first train a shallow model to quantify the bias degree of samples in the pool.
Then, we pair each sample with a bias indicator representing its bias degree, and use these extended samples to train a sample generator.
In this way, this generator can effectively learn the correspondence relationship between bias indicators and samples.
arXiv Detail & Related papers (2023-11-01T04:50:38Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Feature-Level Debiased Natural Language Understanding [86.8751772146264]
Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets.
We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias.
DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
arXiv Detail & Related papers (2022-12-11T06:16:14Z) - A Generative Approach for Mitigating Structural Biases in Natural
Language Inference [24.44419010439227]
In this work, we reformulate the NLI task as a generative task, where a model is conditioned on the biased subset of the input and the label.
We show that this approach is highly robust to large amounts of bias.
We find that generative models are difficult to train and they generally perform worse than discriminative baselines.
arXiv Detail & Related papers (2021-08-31T17:59:45Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - Towards Debiasing NLU Models from Unknown Biases [70.31427277842239]
NLU models often exploit biases to achieve high dataset-specific performance without properly learning the intended task.
We present a self-debiasing framework that prevents models from mainly utilizing biases without knowing them in advance.
arXiv Detail & Related papers (2020-09-25T15:49:39Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.