Pre-training Text Representations as Meta Learning
- URL: http://arxiv.org/abs/2004.05568v1
- Date: Sun, 12 Apr 2020 09:05:47 GMT
- Title: Pre-training Text Representations as Meta Learning
- Authors: Shangwen Lv, Yuechen Wang, Daya Guo, Duyu Tang, Nan Duan, Fuqing Zhu,
Ming Gong, Linjun Shou, Ryan Ma, Daxin Jiang, Guihong Cao, Ming Zhou, Songlin
Hu
- Abstract summary: We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
- Score: 113.3361289756749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training text representations has recently been shown to significantly
improve the state-of-the-art in many natural language processing tasks. The
central goal of pre-training is to learn text representations that are useful
for subsequent tasks. However, existing approaches are optimized by minimizing
a proxy objective, such as the negative log likelihood of language modeling. In
this work, we introduce a learning algorithm which directly optimizes model's
ability to learn text representations for effective learning of downstream
tasks. We show that there is an intrinsic connection between multi-task
pre-training and model-agnostic meta-learning with a sequence of meta-train
steps. The standard multi-task learning objective adopted in BERT is a special
case of our learning algorithm where the depth of meta-train is zero. We study
the problem in two settings: unsupervised pre-training and supervised
pre-training with different pre-training objects to verify the generality of
our approach.Experimental results show that our algorithm brings improvements
and learns better initializations for a variety of downstream tasks.
Related papers
- General-Purpose In-Context Learning by Meta-Learning Transformers [45.63069059498147]
We show that Transformers and other black-box models can be meta-trained to act as general-purpose in-context learners.
We characterize transitions between algorithms that generalize, algorithms that memorize, and algorithms that fail to meta-train at all.
We propose practical interventions such as biasing the training distribution that improve the meta-training and meta-generalization of general-purpose in-context learning algorithms.
arXiv Detail & Related papers (2022-12-08T18:30:22Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Unified Multimodal Pre-training and Prompt-based Tuning for
Vision-Language Understanding and Generation [86.26522210882699]
We propose Unified multimodal pre-training for both Vision-Language understanding and generation.
The proposed UniVL is capable of handling both understanding tasks and generative tasks.
Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model.
arXiv Detail & Related papers (2021-12-10T14:59:06Z) - MetaICL: Learning to Learn In Context [87.23056864536613]
We introduce MetaICL, a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learn-ing on a large set of training tasks.
We show that MetaICL approaches (and sometimes beats) the performance of models fully finetuned on the target task training data, and outperforms much bigger models with nearly 8x parameters.
arXiv Detail & Related papers (2021-10-29T17:42:08Z) - Meta-learning for downstream aware and agnostic pretraining [7.2051162210119495]
We propose using meta-learning to select tasks that provide the most informative learning signals in each episode of pretraining.
We discuss the algorithm of the method and its two variants, downstream-aware and downstream-agnostic pretraining.
arXiv Detail & Related papers (2021-06-06T23:08:09Z) - Self-Supervised Meta-Learning for Few-Shot Natural Language
Classification Tasks [40.97125791174191]
We propose a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text.
We show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning.
arXiv Detail & Related papers (2020-09-17T17:53:59Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.