Your representations are in the network: composable and parallel
adaptation for large scale models
- URL: http://arxiv.org/abs/2303.04105v2
- Date: Tue, 31 Oct 2023 06:19:06 GMT
- Title: Your representations are in the network: composable and parallel
adaptation for large scale models
- Authors: Yonatan Dukler, Alessandro Achille, Hao Yang, Varsha Vivek, Luca
Zancato, Benjamin Bowman, Avinash Ravichandran, Charless Fowlkes, Ashwin
Swaminathan, Stefano Soatto
- Abstract summary: InCA is a lightweight method for transfer learning that cross-attends to any activation layer of a pre-trained model.
We show that, even when selecting a single top-scoring adapter, InCA achieves performance comparable to full fine-tuning.
- Score: 90.26965623489157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose InCA, a lightweight method for transfer learning that
cross-attends to any activation layer of a pre-trained model. During training,
InCA uses a single forward pass to extract multiple activations, which are
passed to external cross-attention adapters, trained anew and combined or
selected for downstream tasks. We show that, even when selecting a single
top-scoring adapter, InCA achieves performance comparable to full fine-tuning,
at a cost comparable to fine-tuning just the last layer. For example, with a
cross-attention probe 1.3% the size of a pre-trained ViT-L/16 model, we achieve
performance within 0.2% of the full fine-tuning paragon at a computational
training cost of 51% of the baseline, on average across 11 downstream
classification. Unlike other forms of efficient adaptation, InCA does not
require backpropagating through the pre-trained model, thus leaving its
execution unaltered at both training and inference. The versatility of InCA is
best illustrated in fine-grained tasks, which may require accessing information
absent in the last layer but accessible in intermediate layer activations.
Since the backbone is fixed, InCA allows parallel ensembling as well as
parallel execution of multiple tasks. InCA achieves state-of-the-art
performance in the ImageNet-to-Sketch multi-task benchmark.
Related papers
- Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Task-Customized Self-Supervised Pre-training with Scalable Dynamic
Routing [76.78772372631623]
A common practice for self-supervised pre-training is to use as much data as possible.
For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance.
It is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.
arXiv Detail & Related papers (2022-05-26T10:49:43Z) - DAFormer: Improving Network Architectures and Training Strategies for
Domain-Adaptive Semantic Segmentation [99.88539409432916]
We study the unsupervised domain adaptation (UDA) process.
We propose a novel UDA method, DAFormer, based on the benchmark results.
DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes.
arXiv Detail & Related papers (2021-11-29T19:00:46Z) - Robust Transfer Learning with Pretrained Language Models through
Adapters [40.45102278979193]
Transfer learning with large pretrained language models like BERT has become a dominating approach for most NLP tasks.
We propose a simple yet effective adapter-based approach to mitigate these issues.
Our experiments demonstrate that such a training scheme leads to improved stability and adversarial robustness in transfer learning to various downstream tasks.
arXiv Detail & Related papers (2021-08-05T02:30:13Z) - Semi-Supervised Few-Shot Classification with Deep Invertible Hybrid
Models [4.189643331553922]
We propose a deep invertible hybrid model which integrates discriminative and generative learning at a latent space level for semi-supervised few-shot classification.
Our main originality lies in our integration of these components at a latent space level, which is effective in preventing overfitting.
arXiv Detail & Related papers (2021-05-22T05:55:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.