In-context Learning Distillation: Transferring Few-shot Learning Ability
of Pre-trained Language Models
- URL: http://arxiv.org/abs/2212.10670v1
- Date: Tue, 20 Dec 2022 22:11:35 GMT
- Title: In-context Learning Distillation: Transferring Few-shot Learning Ability
of Pre-trained Language Models
- Authors: Yukun Huang, Yanda Chen, Zhou Yu, Kathleen McKeown
- Abstract summary: We introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models.
We perform in-context learning distillation under two different few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask In-context Tuning (Multitask-ICT)
Our experiments and analysis reveal that in-context learning objectives and language modeling objectives are complementary under the Multitask-ICT paradigm.
- Score: 55.78264509270503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given the success with in-context learning of large pre-trained language
models, we introduce in-context learning distillation to transfer in-context
few-shot learning ability from large models to smaller models. We propose to
combine in-context learning objectives with language modeling objectives to
distill both the ability to read in-context examples and task knowledge to the
smaller models. We perform in-context learning distillation under two different
few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask
In-context Tuning (Multitask-ICT). Multitask-ICT performs better on multitask
few-shot learning but also requires more computation than Meta-ICT. Our method
shows consistent improvements for both Meta-ICT and Multitask-ICT on two
benchmarks: LAMA and CrossFit. Our extensive experiments and analysis reveal
that in-context learning objectives and language modeling objectives are
complementary under the Multitask-ICT paradigm. In-context learning objectives
achieve the best performance when combined with language modeling objectives.
Related papers
- ConML: A Universal Meta-Learning Framework with Task-Level Contrastive Learning [49.447777286862994]
ConML is a universal meta-learning framework that can be applied to various meta-learning algorithms.
We demonstrate that ConML integrates seamlessly with optimization-based, metric-based, and amortization-based meta-learning algorithms.
arXiv Detail & Related papers (2024-10-08T12:22:10Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - MetaVL: Transferring In-Context Learning Ability From Language Models to
Vision-Language Models [74.89629463600978]
In vision-language domain, most large-scale pre-trained vision-language models do not possess the ability to conduct in-context learning.
In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to the vision domain?
arXiv Detail & Related papers (2023-06-02T07:21:03Z) - Meta-in-context learning in large language models [38.28912796214566]
In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is seen as one of the main contributors to their success.
We coin this phenomenon meta-in-context learning.
We show that meta-in-context learning adaptively reshapes a large language model's priors over expected tasks.
arXiv Detail & Related papers (2023-05-22T10:40:36Z) - Meta Learning to Bridge Vision and Language Models for Multimodal
Few-Shot Learning [38.37682598345653]
We introduce a multimodal meta-learning approach to bridge the gap between vision and language models.
We define a meta-mapper network, acting as a meta-learner, to efficiently bridge frozen large-scale vision and language models.
We evaluate our approach on recently proposed multimodal few-shot benchmarks, measuring how rapidly the model can bind novel visual concepts to words.
arXiv Detail & Related papers (2023-02-28T17:46:18Z) - Few-shot Multimodal Multitask Multilingual Learning [0.0]
We propose few-shot learning for a multimodal multitask multilingual (FM3) setting by adapting pre-trained vision and language models.
FM3 learns the most prominent tasks in the vision and language domains along with their intersections.
arXiv Detail & Related papers (2023-02-19T03:48:46Z) - Multitask Learning for Low Resource Spoken Language Understanding [26.106133114838215]
We train models on dual objectives with automatic speech recognition and intent classification or sentiment classification.
Our models, although being of modest size, show improvements over models trained end-to-end on intent classification.
We study the performance of the models in low-resource scenario by training the models with as few as one example per class.
arXiv Detail & Related papers (2022-11-24T16:38:17Z) - MetaICL: Learning to Learn In Context [87.23056864536613]
We introduce MetaICL, a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learn-ing on a large set of training tasks.
We show that MetaICL approaches (and sometimes beats) the performance of models fully finetuned on the target task training data, and outperforms much bigger models with nearly 8x parameters.
arXiv Detail & Related papers (2021-10-29T17:42:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.