MetaICL: Learning to Learn In Context
- URL: http://arxiv.org/abs/2110.15943v1
- Date: Fri, 29 Oct 2021 17:42:08 GMT
- Title: MetaICL: Learning to Learn In Context
- Authors: Sewon Min, Mike Lewis, Luke Zettlemoyer, Hannaneh Hajishirzi
- Abstract summary: We introduce MetaICL, a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learn-ing on a large set of training tasks.
We show that MetaICL approaches (and sometimes beats) the performance of models fully finetuned on the target task training data, and outperforms much bigger models with nearly 8x parameters.
- Score: 87.23056864536613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce MetaICL (Meta-training for In-Context Learning), a new
meta-training framework for few-shot learning where a pretrained language model
is tuned to do in-context learn-ing on a large set of training tasks. This
meta-training enables the model to more effectively learn a new task in context
at test time, by simply conditioning on a few training examples with no
parameter updates or task-specific templates. We experiment on a large, diverse
collection of tasks consisting of 142 NLP datasets including classification,
question answering, natural language inference, paraphrase detection and more,
across seven different meta-training/target splits. MetaICL outperforms a range
of baselines including in-context learning without meta-training and multi-task
learning followed by zero-shot transfer. We find that the gains are
particularly significant for target tasks that have domain shifts from the
meta-training tasks, and that using a diverse set of the meta-training tasks is
key to improvements. We also show that MetaICL approaches (and sometimes beats)
the performance of models fully finetuned on the target task training data, and
outperforms much bigger models with nearly 8x parameters.
Related papers
- MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning [43.512739869120125]
We propose MAML-en-LLM, a novel method for meta-training large language models (LLMs)
MAML-en-LLM can learn truly generalizable parameters that not only perform well on disjointed tasks but also adapts to unseen tasks.
We demonstrate that MAML-en-LLM outperforms baselines in settings with limited amount of training data on both seen and unseen domains.
arXiv Detail & Related papers (2024-05-19T04:49:42Z) - Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning [119.70303730341938]
We propose ePisode cUrriculum inveRsion (ECI) during data-free meta training and invErsion calibRation following inner loop (ICFIL) during meta testing.
ECI adaptively increases the difficulty level of pseudo episodes according to the real-time feedback of the meta model.
We formulate the optimization process of meta training with ECI as an adversarial form in an end-to-end manner.
arXiv Detail & Related papers (2023-03-20T15:10:41Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Task Attended Meta-Learning for Few-Shot Learning [3.0724051098062097]
We introduce a training curriculum motivated by selective focus in humans, called task attended meta-training, to weight the tasks in a batch.
The comparisons of the models with their non-task-attended counterparts on complex datasets validate its effectiveness.
arXiv Detail & Related papers (2021-06-20T07:34:37Z) - Meta-Regularization by Enforcing Mutual-Exclusiveness [0.8057006406834467]
We propose a regularization technique for meta-learning models that gives the model designer more control over the information flow during meta-training.
Our proposed regularization function shows an accuracy boost of $sim$ $36%$ on the Omniglot dataset.
arXiv Detail & Related papers (2021-01-24T22:57:19Z) - Self-Supervised Meta-Learning for Few-Shot Natural Language
Classification Tasks [40.97125791174191]
We propose a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text.
We show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning.
arXiv Detail & Related papers (2020-09-17T17:53:59Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z) - Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning [79.25478727351604]
We explore a simple process: meta-learning over a whole-classification pre-trained model on its evaluation metric.
We observe this simple method achieves competitive performance to state-of-the-art methods on standard benchmarks.
arXiv Detail & Related papers (2020-03-09T20:06:36Z) - Incremental Meta-Learning via Indirect Discriminant Alignment [118.61152684795178]
We develop a notion of incremental learning during the meta-training phase of meta-learning.
Our approach performs favorably at test time as compared to training a model with the full meta-training set.
arXiv Detail & Related papers (2020-02-11T01:39:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.