Related papers: M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

URL: http://arxiv.org/abs/2210.14793v1
Date: Wed, 26 Oct 2022 15:40:24 GMT
Title: M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Authors: Hanxue Liang, Zhiwen Fan, Rishov Sarkar, Ziyu Jiang, Tianlong Chen, Kai Zou, Yu Cheng, Cong Hao, Zhangyang Wang
Abstract summary: Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly. Current MTL regimes have to activate nearly the entire model even to just execute a single task. We present a model-accelerator co-design framework to enable efficient on-device MTL.
Score: 95.41238363769892
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly. However, when deploying MTL onto those real-world systems that are often resource-constrained or latency-sensitive, two prominent challenges arise: (i) during training, simultaneously optimizing all tasks is often difficult due to gradient conflicts across tasks; (ii) at inference, current MTL regimes have to activate nearly the entire model even to just execute a single task. Yet most real systems demand only one or two tasks at each moment, and switch between tasks as needed: therefore such all tasks activated inference is also highly inefficient and non-scalable. In this paper, we present a model-accelerator co-design framework to enable efficient on-device MTL. Our framework, dubbed M$^3$ViT, customizes mixture-of-experts (MoE) layers into a vision transformer (ViT) backbone for MTL, and sparsely activates task-specific experts during training. Then at inference with any task of interest, the same design allows for activating only the task-corresponding sparse expert pathway, instead of the full model. Our new model design is further enhanced by hardware-level innovations, in particular, a novel computation reordering scheme tailored for memory-constrained MTL that achieves zero-overhead switching between tasks and can scale to any number of experts. When executing single-task inference, M$^{3}$ViT achieves higher accuracies than encoder-focused MTL methods, while significantly reducing 88% inference FLOPs. When implemented on a hardware platform of one Xilinx ZCU104 FPGA, our co-design framework reduces the memory requirement by 2.4 times, while achieving energy efficiency up to 9.23 times higher than a comparable FPGA baseline. Code is available at: https://github.com/VITA-Group/M3ViT.

Related papers

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning [59.001091197106085]
Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously. Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and in tegrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning. We propose a novel approach dubbed Efficient Multi-Task Learning (EMTAL) by transforming a pre-trained Vision Transformer into an efficient multi-task learner.
arXiv Detail & Related papers (2025-01-12T17:41:23Z)
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model [5.643658120200373]
AdapMTL is an adaptive pruning framework for multitask models. It balances sparsity allocation and accuracy performance across multiple tasks. It showcases superior performance compared to state-of-the-art pruning methods.
arXiv Detail & Related papers (2024-08-07T17:19:15Z)
MT2ST: Adaptive Multi-Task to Single-Task Learning [6.185573921868495]
We address the trade-off between generalization in multi-task learning (MTL) and precision in single-task learning (STL) by introducing the Multi-Task to Single-Task (MT2ST) framework. MT2ST is designed to enhance training efficiency and accuracy in multi-modal tasks, showcasing its value as a practical application of efficient machine learning.
arXiv Detail & Related papers (2024-06-26T03:12:07Z)
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning [28.12788291168137]
We present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. Experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks.
arXiv Detail & Related papers (2023-11-04T02:22:40Z)
Deformable Mixer Transformer with Gating for Multi-Task Learning of Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL) We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z)
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts [60.1586169973792]
M$3$ViT is the latest multi-task ViT model that introduces mixture-of-experts (MoE) MoE achieves better accuracy and over 80% reduction computation but leaves challenges for efficient deployment on FPGA. Our work, dubbed Edge-MoE, solves the challenges to introduce the first end-to-end FPGA accelerator for multi-task ViT with a collection of architectural innovations.
arXiv Detail & Related papers (2023-05-30T02:24:03Z)
AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task Learning [1.4963011898406864]
We introduce AdaMTL, an adaptive framework that learns task-aware inference policies for multi-task learning models. AdaMTL reduces the computational complexity by 43% while improving the accuracy by 1.32% compared to single-task models. When deployed on Vuzix M4000 smart glasses, AdaMTL reduces the inference latency and the energy consumption by up to 21.8% and 37.5%, respectively.
arXiv Detail & Related papers (2023-04-17T20:17:44Z)
Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z)
Controllable Pareto Multi-Task Learning [55.945680594691076]
A multi-task learning system aims at solving multiple related tasks at the same time. With a fixed model capacity, the tasks would be conflicted with each other, and the system usually has to make a trade-off among learning all of them together. This work proposes a novel controllable multi-task learning framework, to enable the system to make real-time trade-off control among different tasks with a single model.
arXiv Detail & Related papers (2020-10-13T11:53:55Z)
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature. First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning) Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.