Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language
Models
- URL: http://arxiv.org/abs/2305.17311v1
- Date: Sat, 27 May 2023 00:07:17 GMT
- Title: Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language
Models
- Authors: Yuhui Zhang, Michihiro Yasunaga, Zhengping Zhou, Jeff Z. HaoChen,
James Zou, Percy Liang, Serena Yeung
- Abstract summary: We introduce NeQA, a dataset consisting of questions with negation.
We show that this task can exhibit inverse scaling, U-shaped scaling, or positive scaling.
We find that task 1 has linear scaling, while task 2 has sigmoid-shaped scaling with an emergent transition point.
- Score: 92.11542797811461
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models have been shown to exhibit positive scaling, where
performance improves as models are scaled up in terms of size, compute, or
data. In this work, we introduce NeQA, a dataset consisting of questions with
negation in which language models do not exhibit straightforward positive
scaling. We show that this task can exhibit inverse scaling, U-shaped scaling,
or positive scaling, and the three scaling trends shift in this order as we use
more powerful prompting methods or model families. We hypothesize that solving
NeQA depends on two subtasks: question answering (task 1) and negation
understanding (task 2). We find that task 1 has linear scaling, while task 2
has sigmoid-shaped scaling with an emergent transition point, and composing
these two scaling trends yields the final scaling trend of NeQA. Our work
reveals and provides a way to analyze the complex scaling trends of language
models.
Related papers
- U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models [1.14179290793997]
Large language models (LLMs) have been shown to exhibit emergent abilities in some downstream tasks.
We observe U-shaped scaling for hard questions, and inverted-U scaling followed by steady improvement for easy questions.
We propose a simple yet effective pipeline, called Slice-and-Sandwich, to predict both the emergence threshold and model performance beyond the threshold.
arXiv Detail & Related papers (2024-10-02T16:03:49Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - The Cost of Down-Scaling Language Models: Fact Recall Deteriorates
before In-Context Learning [34.76303922401322]
We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model.
We find a striking difference in how these two abilities evolve due to scaling.
The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.
arXiv Detail & Related papers (2023-10-07T03:36:39Z) - Predicting Emergent Abilities with Infinite Resolution Evaluation [85.89911520190711]
We introduce PassUntil, an evaluation strategy with theoretically infinite resolution, through massive sampling in the decoding phase.
We predict the performance of the 2.4B model on code generation with merely 0.05% deviation before training starts.
We identify a kind of accelerated emergence whose scaling curve cannot be fitted by standard scaling law function.
arXiv Detail & Related papers (2023-10-05T02:35:00Z) - Inverse Scaling: When Bigger Isn't Better [80.42834197416444]
Large language models (LMs) show predictable improvements to overall loss with increased scale.
We present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale.
arXiv Detail & Related papers (2023-06-15T20:11:23Z) - Emergent inabilities? Inverse scaling over the course of pretraining [0.6091702876917281]
We investigate whether, over the course of training, the performance of language models at specific tasks can decrease while general performance remains high.
We find that for two tasks from the Inverse Scaling Challenge - quote-repetition and redefine-math - this is indeed the case.
This highlights the importance of testing model performance at all relevant benchmarks any time they are trained on additional data, even if their overall performance improves.
arXiv Detail & Related papers (2023-05-24T03:42:43Z) - Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains.
How do language models of different sizes learn during pre-training?
Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z) - Inverse scaling can become U-shaped [126.64521446943155]
Scaling up language models has been empirically shown to improve performance on a wide range of downstream tasks.
This paper takes a closer look at these inverse scaling tasks.
We evaluate models of up to 540B parameters, trained on five times more compute than those evaluated in the Inverse Scaling Prize.
arXiv Detail & Related papers (2022-11-03T17:26:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.