Counterfactually-Augmented SNLI Training Data Does Not Yield Better
Generalization Than Unaugmented Data
- URL: http://arxiv.org/abs/2010.04762v1
- Date: Fri, 9 Oct 2020 18:44:02 GMT
- Title: Counterfactually-Augmented SNLI Training Data Does Not Yield Better
Generalization Than Unaugmented Data
- Authors: William Huang, Haokun Liu, and Samuel R. Bowman
- Abstract summary: Counterfactual augmentation of natural language understanding data does not appear to be an effective way of collecting training data.
We build upon this work by using English natural language inference data to test model generalization and robustness.
- Score: 27.738670027154555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A growing body of work shows that models exploit annotation artifacts to
achieve state-of-the-art performance on standard crowdsourced
benchmarks---datasets collected from crowdworkers to create an evaluation
task---while still failing on out-of-domain examples for the same task. Recent
work has explored the use of counterfactually-augmented data---data built by
minimally editing a set of seed examples to yield counterfactual labels---to
augment training data associated with these benchmarks and build more robust
classifiers that generalize better. However, Khashabi et al. (2020) find that
this type of augmentation yields little benefit on reading comprehension tasks
when controlling for dataset size and cost of collection. We build upon this
work by using English natural language inference data to test model
generalization and robustness and find that models trained on a
counterfactually-augmented SNLI dataset do not generalize better than
unaugmented datasets of similar size and that counterfactual augmentation can
hurt performance, yielding models that are less robust to challenge examples.
Counterfactual augmentation of natural language understanding data through
standard crowdsourcing techniques does not appear to be an effective way of
collecting training data and further innovation is required to make this
general line of work viable.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Combining Public Human Activity Recognition Datasets to Mitigate Labeled
Data Scarcity [1.274578243851308]
We propose a novel strategy to combine publicly available datasets with the goal of learning a generalized HAR model.
Our experimental evaluation, which includes experimenting with different state-of-the-art neural network architectures, shows that combining public datasets can significantly reduce the number of labeled samples.
arXiv Detail & Related papers (2023-06-23T18:51:22Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - Augmenting NLP data to counter Annotation Artifacts for NLI Tasks [0.0]
Large pre-trained NLP models achieve high performance on benchmark datasets but do not actually "solve" the underlying task.
We explore this phenomenon by first using contrast and adversarial examples to understand limitations to the model's performance.
We then propose a data augmentation technique to fix this bias and measure its effectiveness.
arXiv Detail & Related papers (2023-02-09T15:34:53Z) - Robust Task-Oriented Dialogue Generation with Contrastive Pre-training
and Adversarial Filtering [17.7709632238066]
Data artifacts incentivize machine learning models to learn non-transferable generalizations.
We investigate whether popular datasets such as MultiWOZ contain such data artifacts.
We propose a contrastive learning based framework to encourage the model to ignore these cues and focus on learning generalisable patterns.
arXiv Detail & Related papers (2022-05-20T03:13:02Z) - Falsesum: Generating Document-level NLI Examples for Recognizing Factual
Inconsistency in Summarization [63.21819285337555]
We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples.
We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries.
We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
arXiv Detail & Related papers (2022-05-12T10:43:42Z) - Exploring the Efficacy of Automatically Generated Counterfactuals for
Sentiment Analysis [17.811597734603144]
We propose an approach to automatically generating counterfactual data for data augmentation and explanation.
A comprehensive evaluation on several different datasets and using a variety of state-of-the-art benchmarks demonstrate how our approach can achieve significant improvements in model performance.
arXiv Detail & Related papers (2021-06-29T10:27:01Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.