Evaluating the Impact of Model Scale for Compositional Generalization in
Semantic Parsing
- URL: http://arxiv.org/abs/2205.12253v1
- Date: Tue, 24 May 2022 17:57:39 GMT
- Title: Evaluating the Impact of Model Scale for Compositional Generalization in
Semantic Parsing
- Authors: Linlu Qiu, Peter Shaw, Panupong Pasupat, Tianze Shi, Jonathan Herzig,
Emily Pitler, Fei Sha, Kristina Toutanova
- Abstract summary: Recent work has shown considerable improvements on many NLP tasks from model scaling.
Fine-tuning generally has flat or negative scaling curves on out-of-distribution compositional generalization.
In-context learning has positive scaling curves, but is generally outperformed by much smaller fine-tuned models.
- Score: 38.770055054268965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their strong performance on many tasks, pre-trained language models
have been shown to struggle on out-of-distribution compositional
generalization. Meanwhile, recent work has shown considerable improvements on
many NLP tasks from model scaling. Can scaling up model size also improve
compositional generalization in semantic parsing? We evaluate encoder-decoder
models up to 11B parameters and decoder-only models up to 540B parameters, and
compare model scaling curves for three different methods for transfer learning:
fine-tuning all parameters, prompt tuning, and in-context learning. We observe
that fine-tuning generally has flat or negative scaling curves on
out-of-distribution compositional generalization in semantic parsing
evaluations. In-context learning has positive scaling curves, but is generally
outperformed by much smaller fine-tuned models. Prompt-tuning can outperform
fine-tuning, suggesting further potential improvements from scaling as it
exhibits a more positive scaling curve. Additionally, we identify several error
trends that vary with model scale. For example, larger models are generally
better at modeling the syntax of the output space, but are also more prone to
certain types of overfitting. Overall, our study highlights limitations of
current techniques for effectively leveraging model scale for compositional
generalization, while our analysis also suggests promising directions for
future work.
Related papers
- A Hitchhiker's Guide to Scaling Law Estimation [56.06982415792523]
Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets.
We estimate more than 1000 scaling laws, then derive a set of best practices for estimating scaling laws in new model families.
arXiv Detail & Related papers (2024-10-15T17:59:10Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and
Evaluation [35.72916406365469]
We compare the generalization of few-shot fine-tuning and in-context learning to challenge datasets.
Our results show that fine-tuned language models can in fact generalize well out-of-domain.
arXiv Detail & Related papers (2023-05-26T13:55:17Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.