Should We Be Pre-training? An Argument for End-task Aware Training as an
Alternative
- URL: http://arxiv.org/abs/2109.07437v1
- Date: Wed, 15 Sep 2021 17:13:18 GMT
- Title: Should We Be Pre-training? An Argument for End-task Aware Training as an
Alternative
- Authors: Lucio M. Dery, Paul Michel, Ameet Talwalkar and Graham Neubig
- Abstract summary: In general, the pre-training step relies on little to no direct knowledge of the task on which the model will be fine-tuned.
We show that multi-tasking the end-task and auxiliary objectives results in significantly better downstream task performance.
- Score: 88.11465517304515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training, where models are trained on an auxiliary objective with
abundant data before being fine-tuned on data from the downstream task, is now
the dominant paradigm in NLP. In general, the pre-training step relies on
little to no direct knowledge of the task on which the model will be
fine-tuned, even when the end-task is known in advance. Our work challenges
this status-quo of end-task agnostic pre-training. First, on three different
low-resource NLP tasks from two domains, we demonstrate that multi-tasking the
end-task and auxiliary objectives results in significantly better downstream
task performance than the widely-used task-agnostic continued pre-training
paradigm of Gururangan et al. (2020). We next introduce an online meta-learning
algorithm that learns a set of multi-task weights to better balance among our
multiple auxiliary objectives, achieving further improvements on end task
performance and data efficiency.
Related papers
- $α$VIL: Learning to Leverage Auxiliary Tasks for Multitask Learning [3.809702129519642]
Multitask Learning aims to train a range of (usually related) tasks with the help of a shared model.
It becomes important to estimate the positive or negative influence auxiliary tasks will have on the target.
We propose a novel method called $alpha$Variable Learning ($alpha$VIL) that is able to adjust task weights dynamically during model training.
arXiv Detail & Related papers (2024-05-13T14:12:33Z) - Learning to Modulate pre-trained Models in RL [22.812215561012874]
Fine-tuning a pre-trained model often suffers from catastrophic forgetting.
Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly.
We propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model.
arXiv Detail & Related papers (2023-06-26T17:53:05Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Diverse Distributions of Self-Supervised Tasks for Meta-Learning in NLP [39.457091182683406]
We aim to provide task distributions for meta-learning by considering self-supervised tasks automatically proposed from unlabeled text.
Our analysis shows that all these factors meaningfully alter the task distribution, some inducing significant improvements in downstream few-shot accuracy of the meta-learned models.
arXiv Detail & Related papers (2021-11-02T01:50:09Z) - Meta-learning with an Adaptive Task Scheduler [93.63502984214918]
Existing meta-learning algorithms randomly sample meta-training tasks with a uniform probability.
It is likely that tasks are detrimental with noise or imbalanced given a limited number of meta-training tasks.
We propose an adaptive task scheduler (ATS) for the meta-training process.
arXiv Detail & Related papers (2021-10-26T22:16:35Z) - Different Strokes for Different Folks: Investigating Appropriate Further
Pre-training Approaches for Diverse Dialogue Tasks [18.375585982984845]
We show that different downstream tasks prefer different further pre-training tasks, which have intrinsic correlation.
Our investigation indicates that it is of great importance and effectiveness to design appropriate further pre-training tasks.
arXiv Detail & Related papers (2021-09-14T08:42:50Z) - Measuring and Harnessing Transference in Multi-Task Learning [58.48659733262734]
Multi-task learning can leverage information learned by one task to benefit the training of other tasks.
We analyze the dynamics of information transfer, or transference, across tasks throughout training.
arXiv Detail & Related papers (2020-10-29T08:25:43Z) - Auxiliary Task Reweighting for Minimum-data Learning [118.69683270159108]
Supervised learning requires a large amount of training data, limiting its application where labeled data is scarce.
To compensate for data scarcity, one possible method is to utilize auxiliary tasks to provide additional supervision for the main task.
We propose a method to automatically reweight auxiliary tasks in order to reduce the data requirement on the main task.
arXiv Detail & Related papers (2020-10-16T08:45:37Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.