Meta-learning for downstream aware and agnostic pretraining
- URL: http://arxiv.org/abs/2106.03270v1
- Date: Sun, 6 Jun 2021 23:08:09 GMT
- Title: Meta-learning for downstream aware and agnostic pretraining
- Authors: Hongyin Luo, Shuyan Dong, Yung-Sung Chuang, Shang-Wen Li
- Abstract summary: We propose using meta-learning to select tasks that provide the most informative learning signals in each episode of pretraining.
We discuss the algorithm of the method and its two variants, downstream-aware and downstream-agnostic pretraining.
- Score: 7.2051162210119495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural network pretraining is gaining attention due to its outstanding
performance in natural language processing applications. However, pretraining
usually leverages predefined task sequences to learn general linguistic clues.
The lack of mechanisms in choosing proper tasks during pretraining makes the
learning and knowledge encoding inefficient. We thus propose using
meta-learning to select tasks that provide the most informative learning
signals in each episode of pretraining. With the proposed method, we aim to
achieve better efficiency in computation and memory usage for the pretraining
process and resulting networks while maintaining the performance. In this
preliminary work, we discuss the algorithm of the method and its two variants,
downstream-aware and downstream-agnostic pretraining. Our experiment plan is
also summarized, while empirical results will be shared in our future works.
Related papers
- Pretext Tasks selection for multitask self-supervised speech
representation learning [23.39079406674442]
This paper introduces a method to select a group of pretext tasks among a set of candidates.
Experiments conducted on speaker recognition and automatic speech recognition validate our approach.
arXiv Detail & Related papers (2021-07-01T16:36:29Z) - Training ELECTRA Augmented with Multi-word Selection [53.77046731238381]
We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.
Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
arXiv Detail & Related papers (2021-05-31T23:19:00Z) - Probabilistic Active Meta-Learning [15.432006404678981]
We introduce task selection based on prior experience into a meta-learning algorithm.
We provide empirical evidence that our approach improves data-efficiency when compared to strong baselines on simulated robotic experiments.
arXiv Detail & Related papers (2020-07-17T12:51:42Z) - MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive.
ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator.
We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.