Related papers: Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning

URL: http://arxiv.org/abs/2501.06884v1
Date: Sun, 12 Jan 2025 17:41:23 GMT
Title: Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
Authors: Hanwen Zhong, Jiaxin Chen, Yutong Zhang, Di Huang, Yunhong Wang,
Abstract summary: Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously.<n>Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and in tegrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning.<n>We propose a novel approach dubbed Efficient Multi-Task Learning (EMTAL) by transforming a pre-trained Vision Transformer into an efficient multi-task learner.
Score: 59.001091197106085
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously. Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and in tegrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning. However, their rigid combination hampers both the optimization of MoE and the ef fectiveness of reparameterization of LoRA, leading to sub-optimal performance and low inference speed. In this work, we propose a novel approach dubbed Efficient Multi-Task Learning (EMTAL) by transforming a pre-trained Vision Transformer into an efficient multi-task learner during training, and reparameterizing the learned structure for efficient inference. Specifically, we firstly develop the MoEfied LoRA structure, which decomposes the pre-trained Transformer into a low-rank MoE structure and employ LoRA to fine-tune the parameters. Subsequently, we take into account the intrinsic asynchronous nature of multi-task learning and devise a learning Quality Retaining (QR) optimization mechanism, by leveraging the historical high-quality class logits to prevent a well-trained task from performance degradation. Finally, we design a router fading strategy to integrate the learned parameters into the original Transformer, archiving efficient inference. Extensive experiments on public benchmarks demonstrate the superiority of our method, compared to the state-of-the-art multi-task learning approaches.

Related papers

TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree [52.44403214958304]
In this paper, we introduce TreeLoRA, a novel approach that constructs layer-wise adapters by leveraging hierarchical gradient similarity.<n>To reduce the computational burden of task similarity estimation, we employ bandit techniques to develop an algorithm based on lower confidence bounds.<n> experiments on both vision transformers (ViTs) and large language models (LLMs) demonstrate the effectiveness and efficiency of our approach.
arXiv Detail & Related papers (2025-06-12T05:25:35Z)
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer [56.898822179122476]
We propose M3DT, a novel mixture-of-experts (MoE) framework that tackles task scalability by further unlocking the model's parameter scalability.<n> Experimental results show that, by increasing the number of experts, M3DT not only consistently enhances its performance as model expansion on the fixed task numbers, but also exhibits remarkable task scalability, successfully extending to 160 tasks with superior performance.
arXiv Detail & Related papers (2025-05-30T09:08:52Z)
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners [60.75160178669076]
We show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online reinforcement learning.<n>We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL.
arXiv Detail & Related papers (2025-05-29T06:41:45Z)
MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning [18.0412262027514]
We propose a novel Mixture of Low-Rank Experts (MoRE) for multi-task.<n>Instead of using an individual LoRA for each task, we align different ranks of LoRA module with different tasks.<n>We also design a novel adaptive rank selector to select the appropriate expert for each task.
arXiv Detail & Related papers (2025-05-28T12:32:09Z)
Multi-Task Reinforcement Learning Enables Parameter Scaling [14.492112822801113]
Multi-task reinforcement learning aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures.
arXiv Detail & Related papers (2025-03-07T04:13:02Z)
MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts [6.245113492272563]
Mixture of Dyadic Experts (MoDE) is a novel design for efficient multi-task adaptation. Our design allows for more fine-grained mixing, thereby increasing the model's ability to jointly handle multiple tasks.
arXiv Detail & Related papers (2024-08-02T18:05:10Z)
Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications. MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling. Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z)
Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities. Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent. We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z)
Cross-Task Affinity Learning for Multitask Dense Scene Predictions [5.939164722752263]
Multitask learning (MTL) has become prominent for its ability to predict multiple tasks jointly. We introduce the Cross-Task Affinity Learning (CTAL) module, a lightweight framework that enhances task refinement in multitask networks. Our results demonstrate state-of-the-art MTL performance for both CNN and transformer backbones, using significantly fewer parameters than single-task learning.
arXiv Detail & Related papers (2024-01-20T05:31:47Z)
Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts. This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals. We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z)
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks [36.34331439747556]
We propose Polyhistor and Polyhistor-Lite to share information across different tasks with a few trainable parameters. Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using 10% of their trainable parameters.
arXiv Detail & Related papers (2022-10-07T00:25:02Z)
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks. We find that their performances are sub-optimal or even lag far behind the single-task baseline. We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z)
UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy. The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature. First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning) Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)
Gradient Surgery for Multi-Task Learning [119.675492088251]
Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient.
arXiv Detail & Related papers (2020-01-19T06:33:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.