Generating Data to Mitigate Spurious Correlations in Natural Language
Inference Datasets
- URL: http://arxiv.org/abs/2203.12942v1
- Date: Thu, 24 Mar 2022 09:08:05 GMT
- Title: Generating Data to Mitigate Spurious Correlations in Natural Language
Inference Datasets
- Authors: Yuxiang Wu, Matt Gardner, Pontus Stenetorp and Pradeep Dasigi
- Abstract summary: Natural language processing models often exploit spurious correlations between task-independent features and labels in datasets to perform well only within the distributions they are trained on.
We propose to tackle this problem by generating a debiased version of a dataset, which can then be used to train a debiased, off-the-shelf model.
Our approach consists of 1) a method for training data generators to generate high-quality, label-consistent data samples; and 2) a filtering mechanism for removing data points that contribute to spurious correlations.
- Score: 27.562256973255728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language processing models often exploit spurious correlations
between task-independent features and labels in datasets to perform well only
within the distributions they are trained on, while not generalising to
different task distributions. We propose to tackle this problem by generating a
debiased version of a dataset, which can then be used to train a debiased,
off-the-shelf model, by simply replacing its training data. Our approach
consists of 1) a method for training data generators to generate high-quality,
label-consistent data samples; and 2) a filtering mechanism for removing data
points that contribute to spurious correlations, measured in terms of
z-statistics. We generate debiased versions of the SNLI and MNLI datasets, and
we evaluate on a large suite of debiased, out-of-distribution, and adversarial
test sets. Results show that models trained on our debiased datasets generalise
better than those trained on the original datasets in all settings. On the
majority of the datasets, our method outperforms or performs comparably to
previous state-of-the-art debiasing strategies, and when combined with an
orthogonal technique, product-of-experts, it improves further and outperforms
previous best results of SNLI-hard and MNLI-hard.
Related papers
- Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Feature-Level Debiased Natural Language Understanding [86.8751772146264]
Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets.
We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias.
DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
arXiv Detail & Related papers (2022-12-11T06:16:14Z) - Too Fine or Too Coarse? The Goldilocks Composition of Data Complexity
for Robust Left-Right Eye-Tracking Classifiers [0.0]
We train machine learning models utilizing a mixed dataset composed of both fine- and coarse-grain data.
For our purposes, finer-grain data refers to data collected using more complex methods whereas coarser-grain data refers to data collected using more simple methods.
arXiv Detail & Related papers (2022-08-24T23:18:08Z) - Automatically Identifying Semantic Bias in Crowdsourced Natural Language
Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets.
interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z) - CrossAug: A Contrastive Data Augmentation Method for Debiasing Fact
Verification Models [14.75693099720436]
We propose CrossAug, a contrastive data augmentation method for debiasing fact verification models.
We employ a two-stage augmentation pipeline to generate new claims and evidences from existing samples.
The generated samples are then paired cross-wise with the original pair, forming contrastive samples that facilitate the model to rely less on spurious patterns.
arXiv Detail & Related papers (2021-09-30T13:19:19Z) - Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles [66.15398165275926]
We propose a method that can automatically detect and ignore dataset-specific patterns, which we call dataset biases.
Our method trains a lower capacity model in an ensemble with a higher capacity model.
We show improvement in all settings, including a 10 point gain on the visual question answering dataset.
arXiv Detail & Related papers (2020-11-07T22:20:03Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.