The Inductive Bias of In-Context Learning: Rethinking Pretraining
Example Design
- URL: http://arxiv.org/abs/2110.04541v1
- Date: Sat, 9 Oct 2021 11:05:16 GMT
- Title: The Inductive Bias of In-Context Learning: Rethinking Pretraining
Example Design
- Authors: Yoav Levine, Noam Wies, Daniel Jannai, Dan Navon, Yedid Hoshen, Amnon
Shashua
- Abstract summary: We show that pretrained NLM can model stronger dependencies between text segments that appeared in the same training example, than it can between text segments that appeared in different training examples.
We propose "kNN-Pretraining": we show that including semantically related non-neighboring sentences in the same pretraining example yields improved sentence representations and open domain question answering abilities.
- Score: 34.900425311720795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretraining Neural Language Models (NLMs) over a large corpus involves
chunking the text into training examples, which are contiguous text segments of
sizes processable by the neural architecture. We highlight a bias introduced by
this common practice: we prove that the pretrained NLM can model much stronger
dependencies between text segments that appeared in the same training example,
than it can between text segments that appeared in different training examples.
This intuitive result has a twofold role. First, it formalizes the motivation
behind a broad line of recent successful NLM training heuristics, proposed for
the pretraining and fine-tuning stages, which do not necessarily appear related
at first glance. Second, our result clearly indicates further improvements to
be made in NLM pretraining for the benefit of Natural Language Understanding
tasks. As an example, we propose "kNN-Pretraining": we show that including
semantically related non-neighboring sentences in the same pretraining example
yields improved sentence representations and open domain question answering
abilities. This theoretically motivated degree of freedom for "pretraining
example design" indicates new training schemes for self-improving
representations.
Related papers
- Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - The Learnability of In-Context Learning [16.182561312622315]
We propose a first-of-its-kind PAC based framework for in-context learnability.
Our framework includes an initial pretraining phase, which fits a function to the pretraining distribution.
We show that in-context learning is more about identifying the task than about learning it.
arXiv Detail & Related papers (2023-03-14T13:28:39Z) - Improving Few-Shot Performance of Language Models via Nearest Neighbor
Calibration [12.334422701057674]
We propose a novel nearest-neighbor calibration framework for in-context learning.
It is inspired by a phenomenon that the in-context learning paradigm produces incorrect labels when inferring training instances.
Experiments on various few-shot text classification tasks demonstrate that our method significantly improves in-context learning.
arXiv Detail & Related papers (2022-12-05T12:49:41Z) - Masked Autoencoders As The Unified Learners For Pre-Trained Sentence
Representation [77.47617360812023]
We extend the recently proposed MAE style pre-training strategy, RetroMAE, to support a wide variety of sentence representation tasks.
The first stage performs RetroMAE over generic corpora, like Wikipedia, BookCorpus, etc., from which the base model is learned.
The second stage takes place on domain-specific data, e.g., MS MARCO and NLI, where the base model is continuingly trained based on RetroMAE and contrastive learning.
arXiv Detail & Related papers (2022-07-30T14:34:55Z) - SDCUP: Schema Dependency-Enhanced Curriculum Pre-Training for Table
Semantic Parsing [19.779493883522072]
This paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.
We propose a schema-aware curriculum learning approach to mitigate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner.
arXiv Detail & Related papers (2021-11-18T02:51:04Z) - An Explanation of In-context Learning as Implicit Bayesian Inference [117.19809377740188]
We study the role of the pretraining distribution on the emergence of in-context learning.
We prove that in-context learning occurs implicitly via Bayesian inference of the latent concept.
We empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.
arXiv Detail & Related papers (2021-11-03T09:12:33Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.