Pre-Training to Learn in Context
- URL: http://arxiv.org/abs/2305.09137v1
- Date: Tue, 16 May 2023 03:38:06 GMT
- Title: Pre-Training to Learn in Context
- Authors: Yuxian Gu, Li Dong, Furu Wei, Minlie Huang
- Abstract summary: The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
- Score: 138.0745138788142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning, where pre-trained language models learn to perform tasks
from task examples and instructions in their contexts, has attracted much
attention in the NLP community. However, the ability of in-context learning is
not fully exploited because language models are not explicitly trained to learn
in context. To this end, we propose PICL (Pre-training for In-Context
Learning), a framework to enhance the language models' in-context learning
ability by pre-training the model on a large collection of "intrinsic tasks" in
the general plain-text corpus using the simple language modeling objective.
PICL encourages the model to infer and perform tasks by conditioning on the
contexts while maintaining task generalization of pre-trained models. We
evaluate the in-context learning performance of the model trained with PICL on
seven widely-used text classification datasets and the Super-NaturalInstrctions
benchmark, which contains 100+ NLP tasks formulated to text generation. Our
experiments show that PICL is more effective and task-generalizable than a
range of baselines, outperforming larger language models with nearly 4x
parameters. The code is publicly available at https://github.com/thu-coai/PICL.
Related papers
- Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting [15.69952375347308]
Language models have the ability to perform in-context learning (ICL), allowing them to flexibly adapt their behavior based on context.
We study structural in-context algorithms in a simple part-of-speech setting using both practical and toy models.
We find that active forgetting, a technique that was recently introduced to help models generalize to new languages, forces models to adopt structural in-context learning solutions.
arXiv Detail & Related papers (2024-05-28T21:38:20Z) - Adapting Large Language Models to Domains via Reading Comprehension [86.24451681746676]
We explore how continued pre-training on domain-specific corpora influences large language models.
We show that training on the raw corpora endows the model with domain knowledge, but drastically hurts its ability for question answering.
We propose a simple method for transforming raw corpora into reading comprehension texts.
arXiv Detail & Related papers (2023-09-18T07:17:52Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Generate to Understand for Representation [3.5325087487696463]
GUR is a pretraining framework that combines language modeling and contrastive learning objectives in a single training step.
GUR achieves impressive results without any labeled training data, outperforming all other pretrained baselines as a retriever at the recall benchmark in a zero-shot setting.
arXiv Detail & Related papers (2023-06-14T06:00:18Z) - EXnet: Efficient In-context Learning for Data-less Text classification [0.0]
We present EXnet, a model specifically designed to perform in-context learning without limitations on the number of examples.
We argue that in-context learning is an effective method to increase task accuracy, and providing examples facilitates cross-task generalization.
With extensive experiments, we show that even our smallest model (15M parameters) generalizes to several unseen classification tasks and domains.
arXiv Detail & Related papers (2023-05-24T01:40:57Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Annotated Dataset Creation through General Purpose Language Models for
non-English Medical NLP [0.5482532589225552]
In our work we suggest to leverage pretrained language models for training data acquisition.
We create a custom dataset which we use to train a medical NER model for German texts, GPTNERMED.
arXiv Detail & Related papers (2022-08-30T18:42:55Z) - How Context Affects Language Models' Factual Predictions [134.29166998377187]
We integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way.
We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.
arXiv Detail & Related papers (2020-05-10T09:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.