TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation
- URL: http://arxiv.org/abs/2108.05988v1
- Date: Thu, 12 Aug 2021 22:37:43 GMT
- Title: TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation
- Authors: Jinyu Yang, Jingjing Liu, Ning Xu, Junzhou Huang
- Abstract summary: Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain.
Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations.
With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge remains unexplored in the literature.
- Score: 54.61786380919243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt
from a labeled source domain to an unlabeled target domain. Previous work is
mainly built upon convolutional neural networks (CNNs) to learn
domain-invariant representations. With the recent exponential increase in
applying Vision Transformer (ViT) to vision tasks, the capability of ViT in
adapting cross-domain knowledge, however, remains unexplored in the literature.
To fill this gap, this paper first comprehensively investigates the
transferability of ViT on a variety of domain adaptation tasks. Surprisingly,
ViT demonstrates superior transferability over its CNNs-based counterparts with
a large margin, while the performance can be further improved by incorporating
adversarial adaptation. Notwithstanding, directly using CNNs-based adaptation
strategies fails to take the advantage of ViT's intrinsic merits (e.g.,
attention mechanism and sequential image representation) which play an
important role in knowledge transfer. To remedy this, we propose an unified
framework, namely Transferable Vision Transformer (TVT), to fully exploit the
transferability of ViT for domain adaptation. Specifically, we delicately
devise a novel and effective unit, which we term Transferability Adaption
Module (TAM). By injecting learned transferabilities into attention blocks, TAM
compels ViT focus on both transferable and discriminative features. Besides, we
leverage discriminative clustering to enhance feature diversity and separation
which are undermined during adversarial domain alignment. To verify its
versatility, we perform extensive studies of TVT on four benchmarks and the
experimental results demonstrate that TVT attains significant improvements
compared to existing state-of-the-art UDA methods.
Related papers
- Transferable-guided Attention Is All You Need for Video Domain Adaptation [42.642008092347986]
Unsupervised adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques.
Our key idea is to use transformer layers as a feature encoder and incorporate spatial and temporal transferability relationships into the attention mechanism.
A Transferable-guided Attention (TransferAttn) framework is then developed to exploit the capacity of the transformer to adapt cross-domain knowledge.
arXiv Detail & Related papers (2024-07-01T15:29:27Z) - Vision Transformer-based Adversarial Domain Adaptation [5.611768906855499]
Vision transformer (ViT) has attracted tremendous attention since its emergence and has been widely used in various computer vision tasks.
In this paper, we fill this gap by employing the ViT as the feature extractor in adversarial domain adaptation.
We empirically demonstrate that ViT can be a plug-and-play component in adversarial domain adaptation.
arXiv Detail & Related papers (2024-04-24T11:41:28Z) - Improving Source-Free Target Adaptation with Vision Transformers
Leveraging Domain Representation Images [8.626222763097335]
Unsupervised Domain Adaptation (UDA) methods facilitate knowledge transfer from a labeled source domain to an unlabeled target domain.
This paper presents an innovative method to bolster ViT performance in source-free target adaptation, beginning with an evaluation of how key, query, and value elements affect ViT outcomes.
Domain Representation Images (DRIs) act as domain-specific markers, effortlessly merging with the training regimen.
arXiv Detail & Related papers (2023-11-21T13:26:13Z) - On the Transferability of Visually Grounded PCFGs [35.64371385720051]
Visually-grounded Compound PCFGcitepzhao-titov-2020-visually.
We consider a zero-shot transfer learning setting where a model is trained on the source domain and is directly applied to target domains, without any further training.
Our experimental results suggest that: the benefits from using visual groundings transfer to text in a domain similar to the training domain but fail to transfer to remote domains.
arXiv Detail & Related papers (2023-10-21T20:19:51Z) - Deeper Insights into ViTs Robustness towards Common Corruptions [82.79764218627558]
We investigate how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs' robustness towards common corruptions.
We demonstrate that overlapping patch embedding and convolutional Feed-Forward Network (FFN) boost performance on robustness.
We also introduce a novel conditional method enabling input-varied augmentations from two angles.
arXiv Detail & Related papers (2022-04-26T08:22:34Z) - Safe Self-Refinement for Transformer-based Domain Adaptation [73.8480218879]
Unsupervised Domain Adaptation (UDA) aims to leverage a label-rich source domain to solve tasks on a related unlabeled target domain.
It is a challenging problem especially when a large domain gap lies between the source and target domains.
We propose a novel solution named SSRT (Safe Self-Refinement for Transformer-based domain adaptation), which brings improvement from two aspects.
arXiv Detail & Related papers (2022-04-16T00:15:46Z) - ConvNets vs. Transformers: Whose Visual Representations are More
Transferable? [49.62201738334348]
We investigate the transfer learning ability of ConvNets and vision transformers in 15 single-task and multi-task performance evaluations.
We observe consistent advantages of Transformer-based backbones on 13 downstream tasks.
arXiv Detail & Related papers (2021-08-11T16:20:38Z) - On Improving Adversarial Transferability of Vision Transformers [97.17154635766578]
Vision transformers (ViTs) process input images as sequences of patches via self-attention.
We study the adversarial feature space of ViT models and their transferability.
We introduce two novel strategies specific to the architecture of ViT models.
arXiv Detail & Related papers (2021-06-08T08:20:38Z) - Transformer-Based Source-Free Domain Adaptation [134.67078085569017]
We study the task of source-free domain adaptation (SFDA), where the source data are not available during target adaptation.
We propose a generic and effective framework based on Transformer, named TransDA, for learning a generalized model for SFDA.
arXiv Detail & Related papers (2021-05-28T23:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.