Intermediate-Task Transfer Learning with Pretrained Models for Natural
Language Understanding: When and Why Does It Work?
- URL: http://arxiv.org/abs/2005.00628v2
- Date: Sat, 9 May 2020 05:23:02 GMT
- Title: Intermediate-Task Transfer Learning with Pretrained Models for Natural
Language Understanding: When and Why Does It Work?
- Authors: Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi
Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman
- Abstract summary: It is poorly understood when and why intermediate-task training is beneficial for a given target task.
We perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations.
We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best.
- Score: 44.88358841370665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While pretrained models such as BERT have shown large gains across natural
language understanding tasks, their performance can be improved by further
training the model on a data-rich intermediate task, before fine-tuning it on a
target task. However, it is still poorly understood when and why
intermediate-task training is beneficial for a given target task. To
investigate this, we perform a large-scale study on the pretrained RoBERTa
model with 110 intermediate-target task combinations. We further evaluate all
trained models with 25 probing tasks meant to reveal the specific skills that
drive transfer. We observe that intermediate tasks requiring high-level
inference and reasoning abilities tend to work best. We also observe that
target task performance is strongly correlated with higher-level abilities such
as coreference resolution. However, we fail to observe more granular
correlations between probing and target task performance, highlighting the need
for further work on broad-coverage probing benchmarks. We also observe evidence
that the forgetting of knowledge learned during pretraining may limit our
analysis, highlighting the need for further work on transfer learning methods
in these settings.
Related papers
- Understanding the Transferability of Representations via Task-Relatedness [8.425690424016986]
We propose a novel analysis that analyzes the transferability of the representations of pre-trained models to downstream tasks in terms of their relatedness to a given reference task.
Our experiments using state-of-the-art pre-trained models show the effectiveness of task-relatedness in explaining transferability on various vision and language tasks.
arXiv Detail & Related papers (2023-07-03T08:06:22Z) - CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code
Models [33.78307982736911]
Cross-task generalization is of strong research and application value.
We propose a large-scale benchmark that includes 216 existing code-related tasks.
arXiv Detail & Related papers (2023-02-08T13:04:52Z) - An Exploration of Data Efficiency in Intra-Dataset Task Transfer for
Dialog Understanding [65.75873687351553]
This study explores the effects of varying quantities of target task training data on sequential transfer learning in the dialog domain.
Unintuitively, our data shows that often target task training data size has minimal effect on how sequential transfer learning performs compared to the same model without transfer learning.
arXiv Detail & Related papers (2022-10-21T04:36:46Z) - Composite Learning for Robust and Effective Dense Predictions [81.2055761433725]
Multi-task learning promises better model generalization on a target task by jointly optimizing it with an auxiliary task.
We find that jointly training a dense prediction (target) task with a self-supervised (auxiliary) task can consistently improve the performance of the target task, while eliminating the need for labeling auxiliary tasks.
arXiv Detail & Related papers (2022-10-13T17:59:16Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - Different Strokes for Different Folks: Investigating Appropriate Further
Pre-training Approaches for Diverse Dialogue Tasks [18.375585982984845]
We show that different downstream tasks prefer different further pre-training tasks, which have intrinsic correlation.
Our investigation indicates that it is of great importance and effectiveness to design appropriate further pre-training tasks.
arXiv Detail & Related papers (2021-09-14T08:42:50Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems.
Our results show that transfer learning is more beneficial than previously thought.
We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.