Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models
- URL: http://arxiv.org/abs/2109.02837v1
- Date: Tue, 7 Sep 2021 03:13:06 GMT
- Title: Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models
- Authors: Kaixin Ma, Filip Ilievski, Jonathan Francis, Satoru Ozaki, Eric
Nyberg, Alessandro Oltramari
- Abstract summary: We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
- Score: 62.28551903638434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Commonsense reasoning benchmarks have been largely solved by fine-tuning
language models. The downside is that fine-tuning may cause models to overfit
to task-specific data and thereby forget their knowledge gained during
pre-training. Recent works only propose lightweight model updates as models may
already possess useful knowledge from past experience, but a challenge remains
in understanding what parts and to what extent models should be refined for a
given task. In this paper, we investigate what models learn from commonsense
reasoning datasets. We measure the impact of three different adaptation methods
on the generalization and accuracy of models. Our experiments with two models
show that fine-tuning performs best, by learning both the content and the
structure of the task, but suffers from overfitting and limited generalization
to novel answers. We observe that alternative adaptation methods like
prefix-tuning have comparable accuracy, but generalize better to unseen answers
and are more robust to adversarial splits.
Related papers
- LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views [28.081794908107604]
Fine-tuning is used to leverage the power of pre-trained foundation models in new downstream tasks.
Recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions.
We propose a novel generalizable fine-tuning method LEVI, where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model.
arXiv Detail & Related papers (2024-02-07T08:16:40Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and
Evaluation [35.72916406365469]
We compare the generalization of few-shot fine-tuning and in-context learning to challenge datasets.
Our results show that fine-tuned language models can in fact generalize well out-of-domain.
arXiv Detail & Related papers (2023-05-26T13:55:17Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Evaluating the Impact of Model Scale for Compositional Generalization in
Semantic Parsing [38.770055054268965]
Recent work has shown considerable improvements on many NLP tasks from model scaling.
Fine-tuning generally has flat or negative scaling curves on out-of-distribution compositional generalization.
In-context learning has positive scaling curves, but is generally outperformed by much smaller fine-tuned models.
arXiv Detail & Related papers (2022-05-24T17:57:39Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.