When to Use Multi-Task Learning vs Intermediate Fine-Tuning for
Pre-Trained Encoder Transfer Learning
- URL: http://arxiv.org/abs/2205.08124v1
- Date: Tue, 17 May 2022 06:48:45 GMT
- Title: When to Use Multi-Task Learning vs Intermediate Fine-Tuning for
Pre-Trained Encoder Transfer Learning
- Authors: Orion Weller, Kevin Seppi, Matt Gardner
- Abstract summary: Transfer learning (TL) in natural language processing has seen a surge of interest in recent years.
Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning.
We compare all three TL methods in a comprehensive analysis on the GLUE dataset suite.
- Score: 15.39115079099451
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer learning (TL) in natural language processing (NLP) has seen a surge
of interest in recent years, as pre-trained models have shown an impressive
ability to transfer to novel tasks. Three main strategies have emerged for
making use of multiple supervised datasets during fine-tuning: training on an
intermediate task before training on the target task (STILTs), using multi-task
learning (MTL) to train jointly on a supplementary task and the target task
(pairwise MTL), or simply using MTL to train jointly on all available datasets
(MTL-ALL). In this work, we compare all three TL methods in a comprehensive
analysis on the GLUE dataset suite. We find that there is a simple heuristic
for when to use one of these techniques over the other: pairwise MTL is better
than STILTs when the target task has fewer instances than the supporting task
and vice versa. We show that this holds true in more than 92% of applicable
cases on the GLUE dataset and validate this hypothesis with experiments varying
dataset size. The simplicity and effectiveness of this heuristic is surprising
and warrants additional exploration by the TL community. Furthermore, we find
that MTL-ALL is worse than the pairwise methods in almost every case. We hope
this study will aid others as they choose between TL methods for NLP tasks.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models.
Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Multitask Learning Can Improve Worst-Group Outcomes [76.92646345152788]
Multitask learning (MTL) is one such widely used technique.
We propose to modify standard MTL by regularizing the joint multitask representation space.
We find that our regularized MTL approach emphconsistently outperforms JTT on both average and worst-group outcomes.
arXiv Detail & Related papers (2023-12-05T21:38:24Z) - Task Grouping for Automated Multi-Task Machine Learning via Task
Affinity Prediction [7.975047833725489]
Multi-task learning (MTL) models can attain significantly higher accuracy than single-task learning (STL) models.
In this paper, we propose a novel automated approach for task grouping.
We identify inherent task features and STL characteristics that can help us to predict whether a group of tasks should be learned together using MTL or if they should be learned independently using STL.
arXiv Detail & Related papers (2023-10-24T23:29:46Z) - Multi-Task Cooperative Learning via Searching for Flat Minima [8.835287696319641]
We propose to formulate MTL as a multi/bi-level optimization problem, and therefore force features to learn from each task in a cooperative approach.
Specifically, we update the sub-model for each task alternatively taking advantage of the learned sub-models of the other tasks.
To alleviate the negative transfer problem during the optimization, we search for flat minima for the current objective function.
arXiv Detail & Related papers (2023-09-21T14:00:11Z) - When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review [7.776434991976473]
Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously while exploiting their mutual relationships.
This review focuses on how MTL could be utilised under different partial supervision settings to address these challenges.
arXiv Detail & Related papers (2023-07-25T20:08:41Z) - Multi-Task Learning as a Bargaining Game [63.49888996291245]
In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks.
Since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts.
We propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update.
arXiv Detail & Related papers (2022-02-02T13:21:53Z) - Semi-supervised Multi-task Learning for Semantics and Depth [88.77716991603252]
Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance.
We propose the Semi-supervised Multi-Task Learning (MTL) method to leverage the available supervisory signals from different datasets.
We present a domain-aware discriminator structure with various alignment formulations to mitigate the domain discrepancy issue among datasets.
arXiv Detail & Related papers (2021-10-14T07:43:39Z) - Multi-Task Learning in Natural Language Processing: An Overview [12.011509222628055]
Multi-Task Learning (MTL) can leverage useful information of related tasks to achieve simultaneous performance improvement on these tasks.
We first review MTL architectures used in NLP tasks and categorize them into four classes, including parallel architecture, hierarchical architecture, modular architecture, and generative adversarial architecture.
We present optimization techniques on loss construction, gradient regularization, data sampling, and task scheduling to properly train a multi-task model.
arXiv Detail & Related papers (2021-09-19T14:51:51Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.