Related papers: How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

URL: http://arxiv.org/abs/2201.04467v1
Date: Wed, 12 Jan 2022 13:35:53 GMT
Title: How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets
Authors: Aarne Talman, Marianna Apidianaki, Stergios Chatzikyriakidis, J\"org Tiedemann
Abstract summary: We show that performance remains high for most GLUE tasks when the models are fine-tuned or tested on corrupted data. Our proposed data transformations can be used as a diagnostic tool for assessing the extent to which a specific dataset constitutes a proper testbed for evaluating models' language understanding capabilities.
Score: 4.645287693363387
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A central question in natural language understanding (NLU) research is whether high performance demonstrates the models' strong reasoning capabilities. We present an extensive series of controlled experiments where pre-trained language models are exposed to data that have undergone specific corruption transformations. The transformations involve removing instances of specific word classes and often lead to non-sensical sentences. Our results show that performance remains high for most GLUE tasks when the models are fine-tuned or tested on corrupted data, suggesting that the models leverage other cues for prediction even in non-sensical contexts. Our proposed data transformations can be used as a diagnostic tool for assessing the extent to which a specific dataset constitutes a proper testbed for evaluating models' language understanding capabilities.

Related papers

Hey, That's My Data! Label-Only Dataset Inference in Large Language Models [63.35066172530291]
CatShift is a label-only dataset-inference framework.<n>It capitalizes on catastrophic forgetting: the tendency of an LLM to overwrite previously learned knowledge when exposed to new data.
arXiv Detail & Related papers (2025-06-06T13:02:59Z)
Relation-based Counterfactual Data Augmentation and Contrastive Learning for Robustifying Natural Language Inference Models [0.0]
We propose a method in which we use token-based and sentence-based augmentation methods to generate counterfactual sentence pairs. We show that the proposed method can improve the performance and robustness of the NLI model.
arXiv Detail & Related papers (2024-10-28T03:43:25Z)
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT) CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction. We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z)
Eeny, meeny, miny, moe. How to choose data for morphological inflection [8.914777617216862]
This paper explores four sampling strategies for the task of morphological inflection using a Transformer model. We investigate the robustness of each strategy across 30 typologically diverse languages. Our results show a clear benefit to selecting data based on model confidence and entropy.
arXiv Detail & Related papers (2022-10-26T04:33:18Z)
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models. We provide a language for describing how training data influences predictions, through a causal framework. Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods. We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments. The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z)
NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance [3.7024660695776066]
We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI) A large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities.
arXiv Detail & Related papers (2021-04-10T12:28:07Z)
Improving Commonsense Causal Reasoning by Adversarial Training and Data Augmentation [14.92157586545743]
This paper presents a number of techniques for making models more robust in the domain of causal reasoning. We show a statistically significant improvement on performance and on both datasets, even with only a small number of additionally generated data points.
arXiv Detail & Related papers (2021-01-13T09:55:29Z)
Detecting and Exorcising Statistical Demons from Language Models with Anti-Models of Negative Data [13.392212395386933]
We find that within a model family, as the number of parameters, training epochs, and data set size increase, so does a model's ability to generalize to negative n-gram data. We propose a form of inductive bias that attenuates such undesirable signals with negative data distributions automatically learned from positive data.
arXiv Detail & Related papers (2020-10-22T16:45:32Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
Data Augmentation for Spoken Language Understanding via Pretrained Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity. We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.