LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based
on Prompt Tuning of T5
- URL: http://arxiv.org/abs/2110.07298v1
- Date: Thu, 14 Oct 2021 12:06:29 GMT
- Title: LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based
on Prompt Tuning of T5
- Authors: Chengwei Qin and Shafiq Joty
- Abstract summary: We propose a unified framework for Lifelong Few-shot Language Learning (LFLL) based on prompt tuning of T5.
Our framework called LFPT5 takes full advantage of PT's strong few-shot learning ability, and simultaneously trains the model as a task solver and a data generator.
With extensive experiments, we demonstrate that LFPT5 can be applied to various different types of tasks and significantly outperform previous methods in different LFLL settings.
- Score: 3.04585143845864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing approaches to lifelong language learning rely on plenty of labeled
data for learning a new task, which is hard to obtain in most real scenarios.
Considering that humans can continually learn new tasks from a handful of
examples, we expect the models also to be able to generalize well on new
few-shot tasks without forgetting the previous ones. In this work, we define
this more challenging yet practical problem as Lifelong Few-shot Language
Learning (LFLL) and propose a unified framework for it based on prompt tuning
of T5. Our framework called LFPT5 takes full advantage of PT's strong few-shot
learning ability, and simultaneously trains the model as a task solver and a
data generator. Before learning a new domain of the same task type, LFPT5
generates pseudo (labeled) samples of previously learned domains, and later
gets trained on those samples to alleviate forgetting of previous knowledge as
it learns the new domain. In addition, a KL divergence loss is minimized to
achieve label consistency between the previous and the current model. While
adapting to a new task type, LFPT5 includes and tunes additional prompt
embeddings for the new task. With extensive experiments, we demonstrate that
LFPT5 can be applied to various different types of tasks and significantly
outperform previous methods in different LFLL settings.
Related papers
- Limits of Transformer Language Models on Learning to Compose Algorithms [77.2443883991608]
We evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks.
Our results indicate that compositional learning in state-of-the-art Transformer language models is highly sample inefficient.
arXiv Detail & Related papers (2024-02-08T16:23:29Z) - Generalization to New Sequential Decision Making Tasks with In-Context
Learning [23.36106067650874]
Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning.
In this paper, we show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks.
We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environmentity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks.
arXiv Detail & Related papers (2023-12-06T15:19:28Z) - Preventing Catastrophic Forgetting in Continual Learning of New Natural
Language Tasks [17.879087904904935]
Multi-Task Learning (MTL) is widely-accepted in Natural Language Processing as a standard technique for learning multiple related tasks in one model.
As systems usually evolve over time, adding a new task to an existing MTL model usually requires retraining the model from scratch on all the tasks.
In this paper, we approach the problem of incrementally expanding MTL models' capability to solve new tasks over time by distilling the knowledge of an already trained model on n tasks into a new one for solving n+1 tasks.
arXiv Detail & Related papers (2023-02-22T00:18:25Z) - Exploring Efficient Few-shot Adaptation for Vision Transformers [70.91692521825405]
We propose a novel efficient Transformer Tuning (eTT) method that facilitates finetuning ViTs in the Few-shot Learning tasks.
Key novelties come from the newly presented Attentive Prefix Tuning (APT) and Domain Residual Adapter (DRA)
We conduct extensive experiments to show the efficacy of our model.
arXiv Detail & Related papers (2023-01-06T08:42:05Z) - Improving Cross-task Generalization of Unified Table-to-text Models with
Compositional Task Configurations [63.04466647849211]
Methods typically encode task information with a simple dataset name as a prefix to the encoder.
We propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization.
We show this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations.
arXiv Detail & Related papers (2022-12-17T02:20:14Z) - Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes)
TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task.
The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z) - Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model [18.922352061424302]
We investigate the ability of text-to-text transfer learning model (T5) to learn numeracy.
We consider four numeracy tasks: numeration, magnitude order prediction, finding minimum and maximum in a series, and sorting.
Although T5 models perform reasonably well in the setting, they struggle considerably in the extrapolation setting across all four tasks.
arXiv Detail & Related papers (2021-09-10T05:33:17Z) - Lifelong Learning of Few-shot Learners across NLP Tasks [45.273018249235705]
We study the challenge of lifelong learning to few-shot learn over a sequence of diverse NLP tasks.
We propose a continual meta-learning approach which learns to generate adapter weights from a few examples.
We demonstrate our approach preserves model performance over training tasks and leads to positive knowledge transfer when the future tasks are learned.
arXiv Detail & Related papers (2021-04-18T10:41:56Z) - Rectification-based Knowledge Retention for Continual Learning [49.1447478254131]
Deep learning models suffer from catastrophic forgetting when trained in an incremental learning setting.
We propose a novel approach to address the task incremental learning problem, which involves training a model on new tasks that arrive in an incremental manner.
Our approach can be used in both the zero-shot and non zero-shot task incremental learning settings.
arXiv Detail & Related papers (2021-03-30T18:11:30Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.