Related papers: Emergent Abilities of Large Language Models

Emergent Abilities of Large Language Models

URL: http://arxiv.org/abs/2206.07682v1
Date: Wed, 15 Jun 2022 17:32:01 GMT
Title: Emergent Abilities of Large Language Models
Authors: Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus
Abstract summary: We consider an ability to be emergent if it is not present in smaller models but is present in larger models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.
Score: 172.08007363384218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.

Related papers

Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models [0.017476232824732776]
This study systematically evaluates the efficacy of self-reflection, mutation, and planning as prompting techniques to test the adaptive capabilities of agents.<n>We conduct experiments with various open-source language models in dynamic environments.<n>We find that larger models generally outperform smaller ones, but that strategic prompting can close this performance gap.
arXiv Detail & Related papers (2025-05-15T17:53:47Z)
Effects of Scale on Language Model Robustness [7.725206196110384]
We show that adversarially trained larger models generalize faster and better to modified attacks not seen during training when compared with smaller models. We also analyze the offense/defense balance of increasing compute, finding parity in some settings and an advantage for offense in others.
arXiv Detail & Related papers (2024-07-25T17:26:41Z)
Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models. We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models. We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z)
Emergent Abilities in Reduced-Scale Generative Language Models [10.51168925267033]
Large language models can solve new tasks without task-specific fine-tuning. This ability is considered an emergent ability and is primarily seen in large language models with billions of parameters. This study investigates if such emergent properties are strictly tied to model size or can be demonstrated by smaller models trained on reduced-scale data.
arXiv Detail & Related papers (2024-04-02T18:00:28Z)
Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z)
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning [56.03057119008865]
We show that scaling diffusion language models can effectively make them strong language learners. We build competent diffusion language models at scale by first acquiring knowledge from massive data. Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks.
arXiv Detail & Related papers (2023-08-23T16:01:12Z)
Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z)
Rarely a problem? Language models exhibit inverse scaling in their predictions following few-type quantifiers [0.6091702876917281]
We focus on 'few'-type quantifiers, as in 'few children like toys', which might pose a particular challenge for language models. We present 960 English sentence stimuli from two human neurolinguistic experiments to 22 autoregressive transformer models of differing sizes.
arXiv Detail & Related papers (2022-12-16T20:01:22Z)
Predicting on the Edge: Identifying Where a Larger Model Does Better [61.793778186198864]
We show that large models have the largest improvement on examples where the small model is most uncertain. We show that a switcher model which defers examples to a larger model when a small model is uncertain can achieve striking improvements in performance and resource usage.
arXiv Detail & Related papers (2022-02-15T18:53:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.