Are Sample-Efficient NLP Models More Robust?
- URL: http://arxiv.org/abs/2210.06456v2
- Date: Tue, 30 May 2023 20:33:26 GMT
- Title: Are Sample-Efficient NLP Models More Robust?
- Authors: Nelson F. Liu and Ananya Kumar and Percy Liang and Robin Jia
- Abstract summary: We investigate the relationship between sample efficiency (amount of data needed to reach a given ID accuracy) and robustness (how models fare on OOD evaluation)
We find that higher sample efficiency is only correlated with better average OOD robustness on some modeling interventions and tasks, but not others.
These results suggest that general-purpose methods for improving sample efficiency are unlikely to yield universal OOD robustness improvements, since such improvements are highly dataset- and task-dependent.
- Score: 90.54786862811183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent results in image classification and extractive question answering have
observed that pre-trained models trained on less in-distribution data have
better out-of-distribution performance. However, it is unclear how broadly
these trends hold. We conduct a large empirical study across three tasks, three
broadly-applicable modeling interventions (increasing model size, using a
different adaptation method, and pre-training on more data), and 14 diverse
datasets to investigate the relationship between sample efficiency (amount of
data needed to reach a given ID accuracy) and robustness (how models fare on
OOD evaluation). We find that higher sample efficiency is only correlated with
better average OOD robustness on some modeling interventions and tasks, but not
others. On individual datasets, models with lower sample efficiency can even be
more robust. These results suggest that general-purpose methods for improving
sample efficiency are unlikely to yield universal OOD robustness improvements,
since such improvements are highly dataset- and task-dependent. Even in an era
of large, multi-purpose pretrained models, task-specific decisions may often be
necessary for OOD generalization.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Clarifying Myths About the Relationship Between Shape Bias, Accuracy, and Robustness [18.55761892159021]
Deep learning models can perform well when evaluated on images from the same distribution as the training set.
Deep learning models can perform well when evaluated on images from the same distribution as the training set.
Applying small blurrings to a model's input image and feeding the model with out-of-distribution (OOD) data can significantly drop the model's accuracy.
Data augmentation is one of the well-practiced methods to improve model robustness against OOD data.
arXiv Detail & Related papers (2024-06-07T15:21:00Z) - Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
and LLMs Evaluations [111.88727295707454]
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP.
We propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts.
We conduct experiments on pre-trained language models for analysis and evaluation of OOD robustness.
arXiv Detail & Related papers (2023-06-07T17:47:03Z) - Effective Robustness against Natural Distribution Shifts for Models with
Different Training Data [113.21868839569]
"Effective robustness" measures the extra out-of-distribution robustness beyond what can be predicted from the in-distribution (ID) performance.
We propose a new evaluation metric to evaluate and compare the effective robustness of models trained on different data.
arXiv Detail & Related papers (2023-02-02T19:28:41Z) - Exploring The Landscape of Distributional Robustness for Question
Answering Models [47.178481044045505]
Investigation spans over 350 models and 16 question answering datasets.
We find that, in many cases, model variations do not affect robustness.
We release all evaluations to encourage researchers to further analyze robustness trends for question answering models.
arXiv Detail & Related papers (2022-10-22T18:17:31Z) - Understanding and Testing Generalization of Deep Networks on
Out-of-Distribution Data [30.471871571256198]
Deep network models perform excellently on In-Distribution data, but can significantly fail on Out-Of-Distribution data.
This study is devoted to analyzing the problem of experimental ID test and designing OOD test paradigm.
arXiv Detail & Related papers (2021-11-17T15:29:07Z) - Complementary Ensemble Learning [1.90365714903665]
We derive a technique to improve performance of state-of-the-art deep learning models.
Specifically, we train auxiliary models which are able to complement state-of-the-art model uncertainty.
arXiv Detail & Related papers (2021-11-09T03:23:05Z) - The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning [25.85044477227461]
Models that are more accurate on the out-of-distribution data relative to this baseline exhibit "effective robustness"
We find that models pre-trained on larger datasets exhibit effective robustness during training that vanishes at convergence.
We discuss several strategies for scaling effective robustness to the high-accuracy regime to improve the out-of-distribution accuracy of state-of-the-art models.
arXiv Detail & Related papers (2021-06-30T06:21:42Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.