TaskWeb: Selecting Better Source Tasks for Multi-task NLP
- URL: http://arxiv.org/abs/2305.13256v2
- Date: Mon, 4 Dec 2023 04:35:55 GMT
- Title: TaskWeb: Selecting Better Source Tasks for Multi-task NLP
- Authors: Joongwon Kim, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi
- Abstract summary: Knowing task relationships via pairwise task transfer improves choosing one or more source tasks that help to learn a new target task.
We use TaskWeb to estimate the benefit of using a source task for learning a new target task, and to choose a subset of helpful training tasks for multi-task training.
Our method improves overall rankings and top-k precision of source tasks by 10% and 38%, respectively.
- Score: 76.03221609799931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work in NLP has shown promising results in training models on large
amounts of tasks to achieve better generalization. However, it is not
well-understood how tasks are related, and how helpful training tasks can be
chosen for a new task. In this work, we investigate whether knowing task
relationships via pairwise task transfer improves choosing one or more source
tasks that help to learn a new target task. We provide TaskWeb, a large-scale
benchmark of pairwise task transfers for 22 NLP tasks using three different
model types, sizes, and adaptation methods, spanning about 25,000 experiments.
Then, we design a new method TaskShop based on our analysis of TaskWeb.
TaskShop uses TaskWeb to estimate the benefit of using a source task for
learning a new target task, and to choose a subset of helpful training tasks
for multi-task training. Our method improves overall rankings and top-k
precision of source tasks by 10% and 38%, respectively. We also use TaskShop to
build much smaller multi-task training sets that improve zero-shot performances
across 11 different target tasks by at least 4.3%.
Related papers
- Cross-Task Affinity Learning for Multitask Dense Scene Predictions [5.939164722752263]
Multitask learning (MTL) has become prominent for its ability to predict multiple tasks jointly.
We introduce the Cross-Task Affinity Learning (CTAL) module, a lightweight framework that enhances task refinement in multitask networks.
Our results demonstrate state-of-the-art MTL performance for both CNN and transformer backbones, using significantly fewer parameters than single-task learning.
arXiv Detail & Related papers (2024-01-20T05:31:47Z) - Improving Multitask Retrieval by Promoting Task Specialization [36.06044647938725]
We show that it is possible to train a multitask retriever that outperforms task-specific retrievers by promoting task specialization.
The model indeed learns parameters that are more task-specialized compared to naive multitasking without prompting or adaptive learning.
arXiv Detail & Related papers (2023-07-01T13:45:15Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - Exploring the Role of Task Transferability in Large-Scale Multi-Task
Learning [28.104054292437525]
We disentangle the effect of scale and relatedness of tasks in multi-task representation learning.
If the target tasks are known ahead of time, then training on a smaller set of related tasks is competitive to the large-scale multi-task training.
arXiv Detail & Related papers (2022-04-23T18:11:35Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Efficiently Identifying Task Groupings for Multi-Task Learning [55.80489920205404]
Multi-task learning can leverage information learned by one task to benefit the training of other tasks.
We suggest an approach to select which tasks should train together in multi-task learning models.
Our method determines task groupings in a single training run by co-training all tasks together and quantifying the effect to which one task's gradient would affect another task's loss.
arXiv Detail & Related papers (2021-09-10T02:01:43Z) - Knowledge Distillation for Multi-task Learning [38.20005345733544]
Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation.
Learning such a model requires to jointly optimize losses of a set of tasks with different difficulty levels, magnitudes, and characteristics.
We propose a knowledge distillation based method in this work to address the imbalance problem in multi-task learning.
arXiv Detail & Related papers (2020-07-14T08:02:42Z) - MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning [82.62433731378455]
We show that tasks with high affinity at a certain scale are not guaranteed to retain this behaviour at other scales.
We propose a novel architecture, namely MTI-Net, that builds upon this finding.
arXiv Detail & Related papers (2020-01-19T21:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.