On the Effect of Pretraining Corpora on In-context Learning by a
Large-scale Language Model
- URL: http://arxiv.org/abs/2204.13509v1
- Date: Thu, 28 Apr 2022 13:59:54 GMT
- Title: On the Effect of Pretraining Corpora on In-context Learning by a
Large-scale Language Model
- Authors: Seongjin Shin, Sang-Woo Lee, Hwijeen Ahn, Sungdong Kim, HyoungSeok
Kim, Boseop Kim, Kyunghyun Cho, Gichang Lee, Woomyoung Park, Jung-Woo Ha,
Nako Sung
- Abstract summary: We investigate the effects of the source and size of the pretraining corpus on in-context learning in a Korean-centric GPT-3 model.
We find that in-context learning performance heavily depends on the corpus domain source, and the size of the pretraining corpus does not necessarily determine the emergence of in-context learning.
- Score: 56.82120834538467
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many recent studies on large-scale language models have reported successful
in-context zero- and few-shot learning ability. However, the in-depth analysis
of when in-context learning occurs is still lacking. For example, it is unknown
how in-context learning performance changes as the training corpus varies.
Here, we investigate the effects of the source and size of the pretraining
corpus on in-context learning in HyperCLOVA, a Korean-centric GPT-3 model. From
our in-depth investigation, we introduce the following observations: (1)
in-context learning performance heavily depends on the corpus domain source,
and the size of the pretraining corpus does not necessarily determine the
emergence of in-context learning, (2) in-context learning ability can emerge
when a language model is trained on a combination of multiple corpora, even
when each corpus does not result in in-context learning on its own, (3)
pretraining with a corpus related to a downstream task does not always
guarantee the competitive in-context learning performance of the downstream
task, especially in the few-shot setting, and (4) the relationship between
language modeling (measured in perplexity) and in-context learning does not
always correlate: e.g., low perplexity does not always imply high in-context
few-shot learning performance.
Related papers
- Transformer verbatim in-context retrieval across time and scale [2.7941582470640784]
In some cases, language models must retrieve in-context information verbatim.
We show that verbatim in-context retrieval developed in a sudden transition early in the training process.
We find that the development of verbatim in-context retrieval is positively correlated with the learning of zero-shot benchmarks.
arXiv Detail & Related papers (2024-11-11T15:50:01Z) - Conditional Language Learning with Context [19.708303468664088]
We propose a simple modification to causal language modeling called conditional finetuning.
We show that a context can "explain away" certain corpus statistics and make the model avoid learning them.
arXiv Detail & Related papers (2024-06-04T05:22:24Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - The Learnability of In-Context Learning [16.182561312622315]
We propose a first-of-its-kind PAC based framework for in-context learnability.
Our framework includes an initial pretraining phase, which fits a function to the pretraining distribution.
We show that in-context learning is more about identifying the task than about learning it.
arXiv Detail & Related papers (2023-03-14T13:28:39Z) - What Can Transformers Learn In-Context? A Case Study of Simple Function
Classes [67.06980111346245]
In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples.
We show that standard Transformers can be trained from scratch to perform in-context learning of linear functions.
We also show that we can train Transformers to in-context learn more complex function classes with performance that matches or exceeds task-specific learning algorithms.
arXiv Detail & Related papers (2022-08-01T18:01:40Z) - An Explanation of In-context Learning as Implicit Bayesian Inference [117.19809377740188]
We study the role of the pretraining distribution on the emergence of in-context learning.
We prove that in-context learning occurs implicitly via Bayesian inference of the latent concept.
We empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.
arXiv Detail & Related papers (2021-11-03T09:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.