TaskExpert: Dynamically Assembling Multi-Task Representations with
Memorial Mixture-of-Experts
- URL: http://arxiv.org/abs/2307.15324v1
- Date: Fri, 28 Jul 2023 06:00:57 GMT
- Title: TaskExpert: Dynamically Assembling Multi-Task Representations with
Memorial Mixture-of-Experts
- Authors: Hanrong Ye and Dan Xu
- Abstract summary: Recent models consider directly decoding task-specific features from one shared task-generic feature.
As the input feature is fully shared and each task decoder also shares decoding parameters for different input samples, it leads to a static feature decoding process.
We propose TaskExpert, a novel multi-task mixture-of-experts model that enables learning multiple representative task-generic feature spaces.
- Score: 11.608682595506354
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Learning discriminative task-specific features simultaneously for multiple
distinct tasks is a fundamental problem in multi-task learning. Recent
state-of-the-art models consider directly decoding task-specific features from
one shared task-generic feature (e.g., feature from a backbone layer), and
utilize carefully designed decoders to produce multi-task features. However, as
the input feature is fully shared and each task decoder also shares decoding
parameters for different input samples, it leads to a static feature decoding
process, producing less discriminative task-specific representations. To tackle
this limitation, we propose TaskExpert, a novel multi-task mixture-of-experts
model that enables learning multiple representative task-generic feature spaces
and decoding task-specific features in a dynamic manner. Specifically,
TaskExpert introduces a set of expert networks to decompose the backbone
feature into several representative task-generic features. Then, the
task-specific features are decoded by using dynamic task-specific gating
networks operating on the decomposed task-generic features. Furthermore, to
establish long-range modeling of the task-specific representations from
different layers of TaskExpert, we design a multi-task feature memory that
updates at each layer and acts as an additional feature expert for dynamic
task-specific feature decoding. Extensive experiments demonstrate that our
TaskExpert clearly outperforms previous best-performing methods on all 9
metrics of two competitive multi-task learning benchmarks for visual scene
understanding (i.e., PASCAL-Context and NYUD-v2). Codes and models will be made
publicly available at https://github.com/prismformore/Multi-Task-Transformer
Related papers
- Prompt Tuning with Soft Context Sharing for Vision-Language Models [42.61889428498378]
We propose a novel method to tune pre-trained vision-language models on multiple target few-shot tasks jointly.
We show that SoftCPT significantly outperforms single-task prompt tuning methods.
arXiv Detail & Related papers (2022-08-29T10:19:10Z) - Multi-Task Learning with Multi-Query Transformer for Dense Prediction [38.476408482050815]
We propose a simple pipeline named Multi-Query Transformer (MQTransformer) to facilitate the reasoning among multiple tasks.
Instead of modeling the dense per-pixel context among different tasks, we seek a task-specific proxy to perform cross-task reasoning.
Experiment results show that the proposed method is an effective approach and achieves state-of-the-art results.
arXiv Detail & Related papers (2022-05-28T06:51:10Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - Modular Adaptive Policy Selection for Multi-Task Imitation Learning
through Task Division [60.232542918414985]
Multi-task learning often suffers from negative transfer, sharing information that should be task-specific.
This is done by using proto-policies as modules to divide the tasks into simple sub-behaviours that can be shared.
We also demonstrate its ability to autonomously divide the tasks into both shared and task-specific sub-behaviours.
arXiv Detail & Related papers (2022-03-28T15:53:17Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - CompositeTasking: Understanding Images by Spatial Composition of Tasks [85.95743368954233]
CompositeTasking is the fusion of multiple, spatially distributed tasks.
The proposed network takes a pair of an image and a set of pixel-wise dense tasks as inputs, and makes the task related predictions for each pixel.
It not only offers us a compact network for multi-tasking, but also allows for task-editing.
arXiv Detail & Related papers (2020-12-16T15:47:02Z) - Reparameterizing Convolutions for Incremental Multi-Task Learning
without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature.
First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning)
Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z) - Knowledge Distillation for Multi-task Learning [38.20005345733544]
Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation.
Learning such a model requires to jointly optimize losses of a set of tasks with different difficulty levels, magnitudes, and characteristics.
We propose a knowledge distillation based method in this work to address the imbalance problem in multi-task learning.
arXiv Detail & Related papers (2020-07-14T08:02:42Z) - MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning [82.62433731378455]
We show that tasks with high affinity at a certain scale are not guaranteed to retain this behaviour at other scales.
We propose a novel architecture, namely MTI-Net, that builds upon this finding.
arXiv Detail & Related papers (2020-01-19T21:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.