Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners
- URL: http://arxiv.org/abs/2212.08066v1
- Date: Thu, 15 Dec 2022 18:59:52 GMT
- Title: Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners
- Authors: Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao,
Erik Learned-Miller, Chuang Gan
- Abstract summary: We propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad')
We optimize this matching process during the training of a single model.
Experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
- Score: 74.92558307689265
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Optimization in multi-task learning (MTL) is more challenging than
single-task learning (STL), as the gradient from different tasks can be
contradictory. When tasks are related, it can be beneficial to share some
parameters among them (cooperation). However, some tasks require additional
parameters with expertise in a specific type of data or discrimination
(specialization). To address the MTL challenge, we propose Mod-Squad, a new
model that is Modularized into groups of experts (a 'Squad'). This structure
allows us to formalize cooperation and specialization as the process of
matching experts and tasks. We optimize this matching process during the
training of a single model. Specifically, we incorporate mixture of experts
(MoE) layers into a transformer model, with a new loss that incorporates the
mutual dependence between tasks and experts. As a result, only a small set of
experts are activated for each task. This prevents the sharing of the entire
backbone model between all tasks, which strengthens the model, especially when
the training set size and the number of tasks scale up. More interestingly, for
each task, we can extract the small set of experts as a standalone model that
maintains the same performance as the large model. Extensive experiments on the
Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5
vision tasks show the superiority of our approach.
Related papers
- Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for
Multi-task Mathematical Problem Solving [77.51817534090789]
We propose textbfJiuZhang2.0, a unified Chinese PLM specially for multi-task mathematical problem solving.
Our idea is to maintain a moderate-sized model and employ the emphcross-task knowledge sharing to improve the model capacity in a multi-task setting.
arXiv Detail & Related papers (2023-06-19T15:45:36Z) - Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts [75.75548749888029]
We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks.
With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.
arXiv Detail & Related papers (2023-05-11T17:57:49Z) - Eliciting Transferability in Multi-task Learning with Task-level
Mixture-of-Experts [29.34065746373841]
transformer models are capable of multi-task learning on diverse NLP tasks.
Humans tackle tasks in a more flexible way, by making proper presumptions on what skills and knowledge are relevant.
We show that the learned routing decisions and experts partially rediscover human categorization of NLP tasks.
arXiv Detail & Related papers (2022-05-25T11:59:05Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Rethinking Hard-Parameter Sharing in Multi-Task Learning [20.792654758645302]
Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy.
The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task.
Using separate bottom-layer parameters could achieve significantly better performance than the common practice.
arXiv Detail & Related papers (2021-07-23T17:26:40Z) - Latent Group Structured Multi-task Learning [2.827177139912107]
In multi-task learning (MTL), we improve the performance of key machine learning algorithms by training various tasks jointly.
We present our group structured latent-space multi-task learning model, which encourages group structured tasks defined by prior information.
Experiments are conducted on both synthetic and real-world datasets, showing competitive performance over single-task learning.
arXiv Detail & Related papers (2020-11-24T05:38:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.