Visual Query Tuning: Towards Effective Usage of Intermediate
Representations for Parameter and Memory Efficient Transfer Learning
- URL: http://arxiv.org/abs/2212.03220v2
- Date: Thu, 27 Apr 2023 00:56:59 GMT
- Title: Visual Query Tuning: Towards Effective Usage of Intermediate
Representations for Parameter and Memory Efficient Transfer Learning
- Authors: Cheng-Hao Tu, Zheda Mai, Wei-Lun Chao
- Abstract summary: We propose visual query tuning (VQT) to aggregate intermediate features of Vision Transformers.
As VQT keeps the intermediate features intact and only learns to combine them, it enjoys memory efficiency in training.
VQT consistently surpasses the state-of-the-art approach that utilizes intermediate features for transfer learning.
- Score: 19.254454866466187
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Intermediate features of a pre-trained model have been shown informative for
making accurate predictions on downstream tasks, even if the model backbone is
kept frozen. The key challenge is how to utilize these intermediate features
given their gigantic amount. We propose visual query tuning (VQT), a simple yet
effective approach to aggregate intermediate features of Vision Transformers.
Through introducing a handful of learnable ``query'' tokens to each layer, VQT
leverages the inner workings of Transformers to ``summarize'' rich intermediate
features of each layer, which can then be used to train the prediction heads of
downstream tasks. As VQT keeps the intermediate features intact and only learns
to combine them, it enjoys memory efficiency in training, compared to many
other parameter-efficient fine-tuning approaches that learn to adapt features
and need back-propagation through the entire backbone. This also suggests the
complementary role between VQT and those approaches in transfer learning.
Empirically, VQT consistently surpasses the state-of-the-art approach that
utilizes intermediate features for transfer learning and outperforms full
fine-tuning in many cases. Compared to parameter-efficient approaches that
adapt features, VQT achieves much higher accuracy under memory constraints.
Most importantly, VQT is compatible with these approaches to attain even higher
accuracy, making it a simple add-on to further boost transfer learning.
Related papers
- Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information.
Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models.
Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs.
In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z) - On the Effectiveness of LayerNorm Tuning for Continual Learning in
Vision Transformers [47.77328392236625]
State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts.
We introduce a two-stage training procedure, where we first optimize the task-specific parameters and then train the classifier with the same selection procedure of the inference time.
Our method achieves results that are either superior or on par with the state of the art while being computationally cheaper.
arXiv Detail & Related papers (2023-08-18T15:11:16Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both
Language and Vision-and-Language Tasks [38.43269863509866]
How to perform parameter-efficient fine-tuning has become fairly important for quick transfer learning and deployment.
We design a novel unified parameter-efficient transfer learning framework that works effectively on both pure language and V&L tasks.
Our proposed framework adds fewer trainable parameters in multi-task learning while achieving superior performances and transfer ability compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-03-08T06:51:33Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.