Vision Transformer Adapters for Generalizable Multitask Learning
- URL: http://arxiv.org/abs/2308.12372v1
- Date: Wed, 23 Aug 2023 18:40:48 GMT
- Title: Vision Transformer Adapters for Generalizable Multitask Learning
- Authors: Deblina Bhattacharjee, Sabine S\"usstrunk, Mathieu Salzmann
- Abstract summary: We introduce the first multitasking vision transformer adapters that learn generalizable task affinities.
Our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner.
In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added.
- Score: 61.79647180647685
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce the first multitasking vision transformer adapters that learn
generalizable task affinities which can be applied to novel tasks and domains.
Integrated into an off-the-shelf vision transformer backbone, our adapters can
simultaneously solve multiple dense vision tasks in a parameter-efficient
manner, unlike existing multitasking transformers that are parametrically
expensive. In contrast to concurrent methods, we do not require retraining or
fine-tuning whenever a new task or domain is added. We introduce a task-adapted
attention mechanism within our adapter framework that combines gradient-based
task similarities with attention-based ones. The learned task affinities
generalize to the following settings: zero-shot task transfer, unsupervised
domain adaptation, and generalization without fine-tuning to novel domains. We
demonstrate that our approach outperforms not only the existing convolutional
neural network-based multitasking methods but also the vision transformer-based
ones. Our project page is at \url{https://ivrl.github.io/VTAGML}.
Related papers
- Dynamic Transformer Architecture for Continual Learning of Multimodal
Tasks [27.59758964060561]
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities.
Continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent.
We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language.
arXiv Detail & Related papers (2024-01-27T03:03:30Z) - InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene
Understanding [11.608682595506354]
Multi-task scene understanding aims to design models that can simultaneously predict several scene understanding tasks with one versatile model.
Previous studies typically process multi-task features in a more local way, and thus cannot effectively learn spatially global and cross-task interactions.
We propose an Inverted Pyramid multi-task Transformer, capable of modeling cross-task interaction among spatial features of different tasks in a global context.
arXiv Detail & Related papers (2023-06-08T00:28:22Z) - AutoTaskFormer: Searching Vision Transformers for Multi-task Learning [35.38583552145653]
Vision Transformers have shown great performance in single tasks such as classification and segmentation.
Existing multi-task vision transformers are handcrafted and heavily rely on human expertise.
We propose a novel one-shot neural architecture search framework, dubbed AutoTaskFormer, to automate this process.
arXiv Detail & Related papers (2023-04-18T06:30:20Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks.
Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z) - Parameter-efficient Multi-task Fine-tuning for Transformers via Shared
Hypernetworks [37.2958914602899]
We show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks.
Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task.
arXiv Detail & Related papers (2021-06-08T16:16:40Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z) - Reparameterizing Convolutions for Incremental Multi-Task Learning
without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature.
First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning)
Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z) - HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable
Hyper Projections [96.64246471034195]
We propose textscHyperGrid, a new approach for highly effective multi-task learning.
Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
arXiv Detail & Related papers (2020-07-12T02:49:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.