GTA: Guided Transfer of Spatial Attention from Object-Centric
Representations
- URL: http://arxiv.org/abs/2401.02656v1
- Date: Fri, 5 Jan 2024 06:24:41 GMT
- Title: GTA: Guided Transfer of Spatial Attention from Object-Centric
Representations
- Authors: SeokHyun Seo, Jinwoo Hong, JungWoo Chae, Kyungyul Kim, Sangheum Hwang
- Abstract summary: We propose a novel and simple regularization method for ViT called Guided Transfer of spatial Attention (GTA)
Our experimental results show that the proposed GTA consistently improves the accuracy across five benchmark datasets especially when the number of training data is small.
- Score: 3.187381965457262
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Utilizing well-trained representations in transfer learning often results in
superior performance and faster convergence compared to training from scratch.
However, even if such good representations are transferred, a model can easily
overfit the limited training dataset and lose the valuable properties of the
transferred representations. This phenomenon is more severe in ViT due to its
low inductive bias. Through experimental analysis using attention maps in ViT,
we observe that the rich representations deteriorate when trained on a small
dataset. Motivated by this finding, we propose a novel and simple
regularization method for ViT called Guided Transfer of spatial Attention
(GTA). Our proposed method regularizes the self-attention maps between the
source and target models. A target model can fully exploit the knowledge
related to object localization properties through this explicit regularization.
Our experimental results show that the proposed GTA consistently improves the
accuracy across five benchmark datasets especially when the number of training
data is small.
Related papers
- Enhancing Performance of Vision Transformers on Small Datasets through
Local Inductive Bias Incorporation [13.056764072568749]
Vision transformers (ViTs) achieve remarkable performance on large datasets, but tend to perform worse than convolutional neural networks (CNNs) on smaller datasets.
We propose a module called Local InFormation Enhancer (LIFE) that extracts patch-level local information and incorporates it into the embeddings used in the self-attention block of ViTs.
Our proposed module is memory and efficient, as well as flexible enough to process auxiliary tokens such as the classification and distillation tokens.
arXiv Detail & Related papers (2023-05-15T11:23:18Z) - Teacher Guided Training: An Efficient Framework for Knowledge Transfer [86.6784627427194]
We propose the teacher-guided training (TGT) framework for training a high-quality compact model.
TGT exploits the fact that the teacher has acquired a good representation of the underlying data domain.
We find that TGT can improve accuracy on several image classification benchmarks and a range of text classification and retrieval tasks.
arXiv Detail & Related papers (2022-08-14T10:33:58Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - Understanding new tasks through the lens of training data via
exponential tilting [43.33775132139584]
We consider the problem of reweighing the training samples to gain insights into the distribution of the target task.
We formulate a distribution shift model based on the exponential tilt assumption and learn train data importance weights.
The learned train data weights can then be used for downstream tasks such as target performance evaluation, fine-tuning, and model selection.
arXiv Detail & Related papers (2022-05-26T18:38:43Z) - PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map [58.53373202647576]
We propose PreTraM, a self-supervised pre-training scheme for trajectory forecasting.
It consists of two parts: 1) Trajectory-Map Contrastive Learning, where we project trajectories and maps to a shared embedding space with cross-modal contrastive learning, and 2) Map Contrastive Learning, where we enhance map representation with contrastive learning on large quantities of HD-maps.
On top of popular baselines such as AgentFormer and Trajectron++, PreTraM boosts their performance by 5.5% and 6.9% relatively in FDE-10 on the challenging nuScenes dataset.
arXiv Detail & Related papers (2022-04-21T23:01:21Z) - Self-Promoted Supervision for Few-Shot Transformer [178.52948452353834]
Self-promoted sUpervisioN (SUN) is a few-shot learning framework for vision transformers (ViTs)
SUN pretrains the ViT on the few-shot learning dataset and then uses it to generate individual location-specific supervision for guiding each patch token.
Experiments show that SUN using ViTs significantly surpasses other few-shot learning frameworks with ViTs and is the first one that achieves higher performance than those CNN state-of-the-arts.
arXiv Detail & Related papers (2022-03-14T12:53:27Z) - How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets.
In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset.
We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z) - Transferring and Regularizing Prediction for Semantic Segmentation [115.88957139226966]
In this paper, we exploit the intrinsic properties of semantic segmentation to alleviate such problem for model transfer.
We present a Regularizer of Prediction Transfer (RPT) that imposes the intrinsic properties as constraints to regularize model transfer in an unsupervised fashion.
Extensive experiments are conducted to verify the proposal of RPT on the transfer of models trained on GTA5 and SYNTHIA (synthetic data) to Cityscapes dataset (urban street scenes)
arXiv Detail & Related papers (2020-06-11T16:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.