Related papers: Vision Transformer Adapters for Generalizable Multitask Learning

Vision Transformer Adapters for Generalizable Multitask Learning

URL: http://arxiv.org/abs/2308.12372v1
Date: Wed, 23 Aug 2023 18:40:48 GMT
Title: Vision Transformer Adapters for Generalizable Multitask Learning
Authors: Deblina Bhattacharjee, Sabine S\"usstrunk, Mathieu Salzmann
Abstract summary: We introduce the first multitasking vision transformer adapters that learn generalizable task affinities. Our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner. In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added.
Score: 61.79647180647685
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the-shelf vision transformer backbone, our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner, unlike existing multitasking transformers that are parametrically expensive. In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added. We introduce a task-adapted attention mechanism within our adapter framework that combines gradient-based task similarities with attention-based ones. The learned task affinities generalize to the following settings: zero-shot task transfer, unsupervised domain adaptation, and generalization without fine-tuning to novel domains. We demonstrate that our approach outperforms not only the existing convolutional neural network-based multitasking methods but also the vision transformer-based ones. Our project page is at \url{https://ivrl.github.io/VTAGML}.

Related papers

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning [59.001091197106085]
Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously. Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and in tegrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning. We propose a novel approach dubbed Efficient Multi-Task Learning (EMTAL) by transforming a pre-trained Vision Transformer into an efficient multi-task learner.
arXiv Detail & Related papers (2025-01-12T17:41:23Z)
Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities. Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent. We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z)
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding [11.608682595506354]
Multi-task scene understanding aims to design models that can simultaneously predict several scene understanding tasks with one versatile model. Previous studies typically process multi-task features in a more local way, and thus cannot effectively learn spatially global and cross-task interactions. We propose an Inverted Pyramid multi-task Transformer, capable of modeling cross-task interaction among spatial features of different tasks in a global context.
arXiv Detail & Related papers (2023-06-08T00:28:22Z)
AutoTaskFormer: Searching Vision Transformers for Multi-task Learning [35.38583552145653]
Vision Transformers have shown great performance in single tasks such as classification and segmentation. Existing multi-task vision transformers are handcrafted and heavily rely on human expertise. We propose a novel one-shot neural architecture search framework, dubbed AutoTaskFormer, to automate this process.
arXiv Detail & Related papers (2023-04-18T06:30:20Z)
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks. We find that their performances are sub-optimal or even lag far behind the single-task baseline. We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z)
MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks. Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z)
Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks [37.2958914602899]
We show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task.
arXiv Detail & Related papers (2021-06-08T16:16:40Z)
Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z)
Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature. First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning) Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)
HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections [96.64246471034195]
We propose textscHyperGrid, a new approach for highly effective multi-task learning. Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
arXiv Detail & Related papers (2020-07-12T02:49:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.