Are Emergent Abilities of Large Language Models a Mirage?
- URL: http://arxiv.org/abs/2304.15004v2
- Date: Mon, 22 May 2023 15:56:25 GMT
- Title: Are Emergent Abilities of Large Language Models a Mirage?
- Authors: Rylan Schaeffer, Brando Miranda, Sanmi Koyejo
- Abstract summary: Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models.
Here, we present an alternative explanation for emergent abilities: that for a particular task and model family, emergent abilities appear due to the researcher's choice of metric.
Specifically, nonlinear or discontinuous metrics produce apparent emergent abilities, whereas linear or continuous metrics produce smooth, continuous predictable changes in model performance.
- Score: 9.683505038585988
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent work claims that large language models display emergent abilities,
abilities not present in smaller-scale models that are present in larger-scale
models. What makes emergent abilities intriguing is two-fold: their sharpness,
transitioning seemingly instantaneously from not present to present, and their
unpredictability, appearing at seemingly unforeseeable model scales. Here, we
present an alternative explanation for emergent abilities: that for a
particular task and model family, when analyzing fixed model outputs, emergent
abilities appear due to the researcher's choice of metric rather than due to
fundamental changes in model behavior with scale. Specifically, nonlinear or
discontinuous metrics produce apparent emergent abilities, whereas linear or
continuous metrics produce smooth, continuous predictable changes in model
performance. We present our alternative explanation in a simple mathematical
model, then test it in three complementary ways: we (1) make, test and confirm
three predictions on the effect of metric choice using the InstructGPT/GPT-3
family on tasks with claimed emergent abilities; (2) make, test and confirm two
predictions about metric choices in a meta-analysis of emergent abilities on
BIG-Bench; and (3) show to choose metrics to produce never-before-seen
seemingly emergent abilities in multiple vision tasks across diverse deep
networks. Via all three analyses, we provide evidence that alleged emergent
abilities evaporate with different metrics or with better statistics, and may
not be a fundamental property of scaling AI models.
Related papers
- Eureka: Evaluating and Understanding Large Foundation Models [23.020996995362104]
We present Eureka, an open-source framework for standardizing evaluations of large foundation models beyond single-score reporting and rankings.
We conduct an analysis of 12 state-of-the-art models, providing in-depth insights into failure understanding and model comparison.
arXiv Detail & Related papers (2024-09-13T18:01:49Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - Understanding Emergent Abilities of Language Models from the Loss Perspective [32.81782726603632]
We study emergent abilities in the lens of pre-training loss, instead of model size or training compute.
We discover that a model exhibits emergent abilities on certain tasks when its pre-training loss falls below a specific threshold.
This inspires us to redefine emergent abilities as those that manifest in models with lower pre-training losses.
arXiv Detail & Related papers (2024-03-23T11:03:31Z) - Unified View of Grokking, Double Descent and Emergent Abilities: A
Perspective from Circuits Competition [83.13280812128411]
Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models.
We present a comprehensive framework that provides a unified view of these three phenomena, focusing on the competition between memorization and generalization circuits.
arXiv Detail & Related papers (2024-02-23T08:14:36Z) - Limitations of Agents Simulated by Predictive Models [1.6649383443094403]
We outline two structural reasons for why predictive models can fail when turned into agents.
We show that both of those failures are fixed by including a feedback loop from the environment.
Our treatment provides a unifying view of those failure modes, and informs the question of why fine-tuning offline learned policies with online learning makes them more effective.
arXiv Detail & Related papers (2024-02-08T17:08:08Z) - Turning large language models into cognitive models [0.0]
We show that large language models can be turned into cognitive models.
These models offer accurate representations of human behavior, even outperforming traditional cognitive models in two decision-making domains.
Taken together, these results suggest that large, pre-trained models can be adapted to become generalist cognitive models.
arXiv Detail & Related papers (2023-06-06T18:00:01Z) - Specializing Smaller Language Models towards Multi-Step Reasoning [56.78474185485288]
We show that abilities can be distilled down from GPT-3.5 ($ge$ 175B) to T5 variants ($le$ 11B)
We propose model specialization, to specialize the model's ability towards a target task.
arXiv Detail & Related papers (2023-01-30T08:51:19Z) - Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains.
How do language models of different sizes learn during pre-training?
Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z) - Emergent Abilities of Large Language Models [172.08007363384218]
We consider an ability to be emergent if it is not present in smaller models but is present in larger models.
The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.
arXiv Detail & Related papers (2022-06-15T17:32:01Z) - FitVid: Overfitting in Pixel-Level Video Prediction [117.59339756506142]
We introduce a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks.
FitVid outperforms the current state-of-the-art models across four different video prediction benchmarks on four different metrics.
arXiv Detail & Related papers (2021-06-24T17:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.