Are Sample-Efficient NLP Models More Robust?
- URL: http://arxiv.org/abs/2210.06456v2
- Date: Tue, 30 May 2023 20:33:26 GMT
- Title: Are Sample-Efficient NLP Models More Robust?
- Authors: Nelson F. Liu and Ananya Kumar and Percy Liang and Robin Jia
- Abstract summary: We investigate the relationship between sample efficiency (amount of data needed to reach a given ID accuracy) and robustness (how models fare on OOD evaluation)
We find that higher sample efficiency is only correlated with better average OOD robustness on some modeling interventions and tasks, but not others.
These results suggest that general-purpose methods for improving sample efficiency are unlikely to yield universal OOD robustness improvements, since such improvements are highly dataset- and task-dependent.
- Score: 90.54786862811183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent results in image classification and extractive question answering have
observed that pre-trained models trained on less in-distribution data have
better out-of-distribution performance. However, it is unclear how broadly
these trends hold. We conduct a large empirical study across three tasks, three
broadly-applicable modeling interventions (increasing model size, using a
different adaptation method, and pre-training on more data), and 14 diverse
datasets to investigate the relationship between sample efficiency (amount of
data needed to reach a given ID accuracy) and robustness (how models fare on
OOD evaluation). We find that higher sample efficiency is only correlated with
better average OOD robustness on some modeling interventions and tasks, but not
others. On individual datasets, models with lower sample efficiency can even be
more robust. These results suggest that general-purpose methods for improving
sample efficiency are unlikely to yield universal OOD robustness improvements,
since such improvements are highly dataset- and task-dependent. Even in an era
of large, multi-purpose pretrained models, task-specific decisions may often be
necessary for OOD generalization.
Related papers
- Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks [59.47851630504264]
Free-text explanations are expressive and easy to understand, but many datasets lack annotated explanation data.
We fine-tune T5-Large and OLMo-7B models and assess the impact of fine-tuning data quality, the number of fine-tuning samples, and few-shot selection methods.
The models are evaluated on 19 diverse OOD datasets across three tasks: natural language inference (NLI), fact-checking, and hallucination detection in abstractive summarization.
arXiv Detail & Related papers (2025-02-07T10:01:32Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters [33.401355417911084]
This study is on leveraging the knowledge of large pretrained models to improve handling of OOD scenarios and tackle domain generalization problems.
We employ parameter-efficient fine-tuning (PEFT) techniques to effectively preserve OOD robustness while working with large models.
Our experiments and analysis confirm that the most effective approaches involve ensembling diverse models and increasing the scale of pretraining.
arXiv Detail & Related papers (2023-10-17T07:01:24Z) - Effective Robustness against Natural Distribution Shifts for Models with
Different Training Data [113.21868839569]
"Effective robustness" measures the extra out-of-distribution robustness beyond what can be predicted from the in-distribution (ID) performance.
We propose a new evaluation metric to evaluate and compare the effective robustness of models trained on different data.
arXiv Detail & Related papers (2023-02-02T19:28:41Z) - Exploring The Landscape of Distributional Robustness for Question
Answering Models [47.178481044045505]
Investigation spans over 350 models and 16 question answering datasets.
We find that, in many cases, model variations do not affect robustness.
We release all evaluations to encourage researchers to further analyze robustness trends for question answering models.
arXiv Detail & Related papers (2022-10-22T18:17:31Z) - Understanding and Testing Generalization of Deep Networks on
Out-of-Distribution Data [30.471871571256198]
Deep network models perform excellently on In-Distribution data, but can significantly fail on Out-Of-Distribution data.
This study is devoted to analyzing the problem of experimental ID test and designing OOD test paradigm.
arXiv Detail & Related papers (2021-11-17T15:29:07Z) - Complementary Ensemble Learning [1.90365714903665]
We derive a technique to improve performance of state-of-the-art deep learning models.
Specifically, we train auxiliary models which are able to complement state-of-the-art model uncertainty.
arXiv Detail & Related papers (2021-11-09T03:23:05Z) - The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning [25.85044477227461]
Models that are more accurate on the out-of-distribution data relative to this baseline exhibit "effective robustness"
We find that models pre-trained on larger datasets exhibit effective robustness during training that vanishes at convergence.
We discuss several strategies for scaling effective robustness to the high-accuracy regime to improve the out-of-distribution accuracy of state-of-the-art models.
arXiv Detail & Related papers (2021-06-30T06:21:42Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.