TOAST: Transfer Learning via Attention Steering
- URL: http://arxiv.org/abs/2305.15542v2
- Date: Tue, 11 Jul 2023 17:57:06 GMT
- Title: TOAST: Transfer Learning via Attention Steering
- Authors: Baifeng Shi, Siyu Gai, Trevor Darrell, Xin Wang
- Abstract summary: Current transfer learning methods often fail to focus on task-relevant features.
We introduce Top-Down Attention Steering (TOAST), a novel transfer learning algorithm that steers the attention to task-specific features.
TOAST substantially improves performance across a range of fine-grained visual classification datasets.
- Score: 77.83191769502763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning involves adapting a pre-trained model to novel downstream
tasks. However, we observe that current transfer learning methods often fail to
focus on task-relevant features. In this work, we explore refocusing model
attention for transfer learning. We introduce Top-Down Attention Steering
(TOAST), a novel transfer learning algorithm that keeps the pre-trained
backbone frozen, selects task-relevant features in the output, and feeds those
features back to the model to steer the attention to the task-specific
features. By refocusing the attention only, TOAST achieves state-of-the-art
results on a number of transfer learning benchmarks, while having a small
number of tunable parameters. Compared to fully fine-tuning, LoRA, and prompt
tuning, TOAST substantially improves performance across a range of fine-grained
visual classification datasets (e.g., 81.1% -> 86.2% on FGVC). TOAST also
outperforms the fully fine-tuned Alpaca and Vicuna models on
instruction-following language generation. Code is available at
https://github.com/bfshi/TOAST.
Related papers
- Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence [60.37934652213881]
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain.
This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation.
We present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead.
arXiv Detail & Related papers (2024-07-26T17:51:58Z) - Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer [44.10678347943115]
Class-incremental learning (CIL) aims to enable models to continuously learn new classes while overcoming catastrophic forgetting.
In this paper, we revisit different parameter-efficient tuning (PET) methods within the context of continual learning.
We observe that adapter tuning demonstrates superiority over prompt-based methods, even without parameter expansion in each learning session.
arXiv Detail & Related papers (2024-03-29T05:23:12Z) - Revisiting the Power of Prompt for Visual Tuning [50.11465784194896]
This study explores the correlation evolvement between prompts and patch tokens during proficient training.
Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes.
Our method significantly advances the adaptation for self-supervised pretraining, achieving impressive task performance gains of at least 10% to 30%.
arXiv Detail & Related papers (2024-02-04T07:49:02Z) - PIVOT: Prompting for Video Continual Learning [50.80141083993668]
We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain.
Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
arXiv Detail & Related papers (2022-12-09T13:22:27Z) - Visual Query Tuning: Towards Effective Usage of Intermediate
Representations for Parameter and Memory Efficient Transfer Learning [19.254454866466187]
We propose visual query tuning (VQT) to aggregate intermediate features of Vision Transformers.
As VQT keeps the intermediate features intact and only learns to combine them, it enjoys memory efficiency in training.
VQT consistently surpasses the state-of-the-art approach that utilizes intermediate features for transfer learning.
arXiv Detail & Related papers (2022-12-06T18:39:45Z) - Multi-Modal Few-Shot Temporal Action Detection [157.96194484236483]
Few-shot (FS) and zero-shot (ZS) learning are two different approaches for scaling temporal action detection to new classes.
We introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD.
arXiv Detail & Related papers (2022-11-27T18:13:05Z) - Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes)
TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task.
The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z) - Optimizing Active Learning for Low Annotation Budgets [6.753808772846254]
In deep learning, active learning is usually implemented as an iterative process in which successive deep models are updated via fine tuning.
We tackle this issue by using an approach inspired by transfer learning.
We introduce a novel acquisition function which exploits the iterative nature of AL process to select samples in a more robust fashion.
arXiv Detail & Related papers (2022-01-18T18:53:10Z) - SSAST: Self-Supervised Audio Spectrogram Transformer [19.09439093130855]
We propose to pretrain the Audio Spectrogram Transformer (AST) model with joint discriminative and generative masked spectrogram patch modeling (MSPM) using unlabeled audio.
We evaluate our pretrained models on both audio and speech classification tasks including audio event classification, keyword spotting, emotion recognition, and speaker identification.
To the best of our knowledge, it is the first patch-based self-supervised learning framework in the audio and speech domain, and also the first self-supervised learning framework for AST.
arXiv Detail & Related papers (2021-10-19T07:58:28Z) - Investigating Transferability in Pretrained Language Models [8.83046338075119]
We consider a simple ablation technique for determining the impact of each pretrained layer on transfer task performance.
This technique reveals that in BERT, layers with high probing performance on downstream GLUE tasks are neither necessary nor sufficient for high accuracy on those tasks.
arXiv Detail & Related papers (2020-04-30T17:23:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.