Related papers: When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning

When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning

URL: http://arxiv.org/abs/2205.08124v1
Date: Tue, 17 May 2022 06:48:45 GMT
Title: When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning
Authors: Orion Weller, Kevin Seppi, Matt Gardner
Abstract summary: Transfer learning (TL) in natural language processing has seen a surge of interest in recent years. Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning. We compare all three TL methods in a comprehensive analysis on the GLUE dataset suite.
Score: 15.39115079099451
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transfer learning (TL) in natural language processing (NLP) has seen a surge of interest in recent years, as pre-trained models have shown an impressive ability to transfer to novel tasks. Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning: training on an intermediate task before training on the target task (STILTs), using multi-task learning (MTL) to train jointly on a supplementary task and the target task (pairwise MTL), or simply using MTL to train jointly on all available datasets (MTL-ALL). In this work, we compare all three TL methods in a comprehensive analysis on the GLUE dataset suite. We find that there is a simple heuristic for when to use one of these techniques over the other: pairwise MTL is better than STILTs when the target task has fewer instances than the supporting task and vice versa. We show that this holds true in more than 92% of applicable cases on the GLUE dataset and validate this hypothesis with experiments varying dataset size. The simplicity and effectiveness of this heuristic is surprising and warrants additional exploration by the TL community. Furthermore, we find that MTL-ALL is worse than the pairwise methods in almost every case. We hope this study will aid others as they choose between TL methods for NLP tasks.

Related papers

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models. Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z)
Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space. We show that MTL can be successful with classification tasks with little, or non-overlapping annotations. We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z)
Multitask Learning Can Improve Worst-Group Outcomes [76.92646345152788]
Multitask learning (MTL) is one such widely used technique. We propose to modify standard MTL by regularizing the joint multitask representation space. We find that our regularized MTL approach emphconsistently outperforms JTT on both average and worst-group outcomes.
arXiv Detail & Related papers (2023-12-05T21:38:24Z)
Task Grouping for Automated Multi-Task Machine Learning via Task Affinity Prediction [7.975047833725489]
Multi-task learning (MTL) models can attain significantly higher accuracy than single-task learning (STL) models. In this paper, we propose a novel automated approach for task grouping. We identify inherent task features and STL characteristics that can help us to predict whether a group of tasks should be learned together using MTL or if they should be learned independently using STL.
arXiv Detail & Related papers (2023-10-24T23:29:46Z)
Multi-Task Cooperative Learning via Searching for Flat Minima [8.835287696319641]
We propose to formulate MTL as a multi/bi-level optimization problem, and therefore force features to learn from each task in a cooperative approach. Specifically, we update the sub-model for each task alternatively taking advantage of the learned sub-models of the other tasks. To alleviate the negative transfer problem during the optimization, we search for flat minima for the current objective function.
arXiv Detail & Related papers (2023-09-21T14:00:11Z)
When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review [7.776434991976473]
Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously while exploiting their mutual relationships. This review focuses on how MTL could be utilised under different partial supervision settings to address these challenges.
arXiv Detail & Related papers (2023-07-25T20:08:41Z)
Multi-Task Learning as a Bargaining Game [63.49888996291245]
In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks. Since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts. We propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update.
arXiv Detail & Related papers (2022-02-02T13:21:53Z)
Semi-supervised Multi-task Learning for Semantics and Depth [88.77716991603252]
Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance. We propose the Semi-supervised Multi-Task Learning (MTL) method to leverage the available supervisory signals from different datasets. We present a domain-aware discriminator structure with various alignment formulations to mitigate the domain discrepancy issue among datasets.
arXiv Detail & Related papers (2021-10-14T07:43:39Z)
Multi-Task Learning in Natural Language Processing: An Overview [12.011509222628055]
Multi-Task Learning (MTL) can leverage useful information of related tasks to achieve simultaneous performance improvement on these tasks. We first review MTL architectures used in NLP tasks and categorize them into four classes, including parallel architecture, hierarchical architecture, modular architecture, and generative adversarial architecture. We present optimization techniques on loss construction, gradient regularization, data sampling, and task scheduling to properly train a multi-task model.
arXiv Detail & Related papers (2021-09-19T14:51:51Z)
Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint. We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.