Exploring The Landscape of Distributional Robustness for Question
Answering Models
- URL: http://arxiv.org/abs/2210.12517v1
- Date: Sat, 22 Oct 2022 18:17:31 GMT
- Title: Exploring The Landscape of Distributional Robustness for Question
Answering Models
- Authors: Anas Awadalla, Mitchell Wortsman, Gabriel Ilharco, Sewon Min, Ian
Magnusson, Hannaneh Hajishirzi, Ludwig Schmidt
- Abstract summary: Investigation spans over 350 models and 16 question answering datasets.
We find that, in many cases, model variations do not affect robustness.
We release all evaluations to encourage researchers to further analyze robustness trends for question answering models.
- Score: 47.178481044045505
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We conduct a large empirical evaluation to investigate the landscape of
distributional robustness in question answering. Our investigation spans over
350 models and 16 question answering datasets, including a diverse set of
architectures, model sizes, and adaptation methods (e.g., fine-tuning, adapter
tuning, in-context learning, etc.). We find that, in many cases, model
variations do not affect robustness and in-distribution performance alone
determines out-of-distribution performance. Moreover, our findings indicate
that i) zero-shot and in-context learning methods are more robust to
distribution shifts than fully fine-tuned models; ii) few-shot prompt
fine-tuned models exhibit better robustness than few-shot fine-tuned span
prediction models; iii) parameter-efficient and robustness enhancing training
methods provide no significant robustness improvements. In addition, we
publicly release all evaluations to encourage researchers to further analyze
robustness trends for question answering models.
Related papers
- Assessing Robustness of Machine Learning Models using Covariate Perturbations [0.6749750044497732]
This paper proposes a comprehensive framework for assessing the robustness of machine learning models.
We explore various perturbation strategies to assess robustness and examine their impact on model predictions.
We demonstrate the effectiveness of our approach in comparing robustness across models, identifying the instabilities in the model, and enhancing model robustness.
arXiv Detail & Related papers (2024-08-02T14:41:36Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check [53.152011258252315]
We show that using phonetic and graphic information reasonably is effective for Chinese Spelling Check.
Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models.
The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
arXiv Detail & Related papers (2023-07-25T17:02:38Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - Are Sample-Efficient NLP Models More Robust? [90.54786862811183]
We investigate the relationship between sample efficiency (amount of data needed to reach a given ID accuracy) and robustness (how models fare on OOD evaluation)
We find that higher sample efficiency is only correlated with better average OOD robustness on some modeling interventions and tasks, but not others.
These results suggest that general-purpose methods for improving sample efficiency are unlikely to yield universal OOD robustness improvements, since such improvements are highly dataset- and task-dependent.
arXiv Detail & Related papers (2022-10-12T17:54:59Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning [25.85044477227461]
Models that are more accurate on the out-of-distribution data relative to this baseline exhibit "effective robustness"
We find that models pre-trained on larger datasets exhibit effective robustness during training that vanishes at convergence.
We discuss several strategies for scaling effective robustness to the high-accuracy regime to improve the out-of-distribution accuracy of state-of-the-art models.
arXiv Detail & Related papers (2021-06-30T06:21:42Z) - Voting based ensemble improves robustness of defensive models [82.70303474487105]
We study whether it is possible to create an ensemble to further improve robustness.
By ensembling several state-of-the-art pre-trained defense models, our method can achieve a 59.8% robust accuracy.
arXiv Detail & Related papers (2020-11-28T00:08:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.