ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
- URL: http://arxiv.org/abs/2111.10952v1
- Date: Mon, 22 Nov 2021 02:34:46 GMT
- Title: ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
- Authors: Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven
Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo
Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler
- Abstract summary: This paper introduces ExMix, a massive collection of 107 supervised NLP tasks across diverse domains and task-families.
Using ExMix, we study the effect of multi-task pre-training at the largest scale to date, and analyze co-training transfer amongst common families of tasks.
We propose ExT5, a model pre-trained using a multi-task objective of self-supervised span denoising and supervised ExMix.
- Score: 56.54359715403561
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the recent success of multi-task learning and transfer learning for
natural language processing (NLP), few works have systematically studied the
effect of scaling up the number of tasks during pre-training. Towards this
goal, this paper introduces ExMix (Extreme Mixture): a massive collection of
107 supervised NLP tasks across diverse domains and task-families. Using ExMix,
we study the effect of multi-task pre-training at the largest scale to date,
and analyze co-training transfer amongst common families of tasks. Through this
analysis, we show that manually curating an ideal set of tasks for multi-task
pre-training is not straightforward, and that multi-task scaling can vastly
improve models on its own. Finally, we propose ExT5: a model pre-trained using
a multi-task objective of self-supervised span denoising and supervised ExMix.
Via extensive experiments, we show that ExT5 outperforms strong T5 baselines on
SuperGLUE, GEM, Rainbow, Closed-Book QA tasks, and several tasks outside of
ExMix. ExT5 also significantly improves sample efficiency while pre-training.
Related papers
- Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization [7.776434991976473]
Multi-Task Learning (MTL) involves the concurrent training of multiple tasks.
We propose an advanced MTL model specifically designed for dense vision tasks.
arXiv Detail & Related papers (2024-12-04T10:05:47Z) - Instruction Pre-Training: Language Models are Supervised Multitask Learners [115.95022434390181]
In this paper, we propose a framework that augments massive raw corpora with instruction-response pairs to pre-train language models (LMs)
In our experiments, we synthesize 200M instruction-response pairs covering 40+ task categories to verify the effectiveness of Instruction Pre-Training.
arXiv Detail & Related papers (2024-06-20T16:55:33Z) - Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition [10.36399200974439]
We introduce a novel method combining multi-modal and multi-task unsupervised pre-training with a translation-based supervised mid-training approach.
We empirically demonstrate that such a multi-stage approach leads to relative word error rate (WER) improvements of up to 38.45% over baselines on both Librispeech and SUPERB.
arXiv Detail & Related papers (2024-03-28T20:23:39Z) - Cross-Task Affinity Learning for Multitask Dense Scene Predictions [5.939164722752263]
Multitask learning (MTL) has become prominent for its ability to predict multiple tasks jointly.
We introduce the Cross-Task Affinity Learning (CTAL) module, a lightweight framework that enhances task refinement in multitask networks.
Our results demonstrate state-of-the-art MTL performance for both CNN and transformer backbones, using significantly fewer parameters than single-task learning.
arXiv Detail & Related papers (2024-01-20T05:31:47Z) - DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines [15.332562681746081]
This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training.
We optimize micro-batch construction using a dynamic programming-based approach, and handle micro-batch execution time variation through dynamic pipeline and communication scheduling.
arXiv Detail & Related papers (2023-11-17T09:48:45Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning [82.62433731378455]
We show that tasks with high affinity at a certain scale are not guaranteed to retain this behaviour at other scales.
We propose a novel architecture, namely MTI-Net, that builds upon this finding.
arXiv Detail & Related papers (2020-01-19T21:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.