Related papers: TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

URL: http://arxiv.org/abs/2108.05988v1
Date: Thu, 12 Aug 2021 22:37:43 GMT
Title: TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation
Authors: Jinyu Yang, Jingjing Liu, Ning Xu, Junzhou Huang
Abstract summary: Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain. Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations. With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge remains unexplored in the literature.
Score: 54.61786380919243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain. Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations. With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge, however, remains unexplored in the literature. To fill this gap, this paper first comprehensively investigates the transferability of ViT on a variety of domain adaptation tasks. Surprisingly, ViT demonstrates superior transferability over its CNNs-based counterparts with a large margin, while the performance can be further improved by incorporating adversarial adaptation. Notwithstanding, directly using CNNs-based adaptation strategies fails to take the advantage of ViT's intrinsic merits (e.g., attention mechanism and sequential image representation) which play an important role in knowledge transfer. To remedy this, we propose an unified framework, namely Transferable Vision Transformer (TVT), to fully exploit the transferability of ViT for domain adaptation. Specifically, we delicately devise a novel and effective unit, which we term Transferability Adaption Module (TAM). By injecting learned transferabilities into attention blocks, TAM compels ViT focus on both transferable and discriminative features. Besides, we leverage discriminative clustering to enhance feature diversity and separation which are undermined during adversarial domain alignment. To verify its versatility, we perform extensive studies of TVT on four benchmarks and the experimental results demonstrate that TVT attains significant improvements compared to existing state-of-the-art UDA methods.

Related papers

The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability [38.32538271219404]
We investigate the role of computational redundancy in Vision Transformers (ViTs) and its impact on adversarial transferability. We identify two forms of redundancy, including the data-level and model-level, that can be harnessed to amplify attack effectiveness. Building on this insight, we design a suite of techniques, including attention sparsity manipulation, attention head permutation, clean token regularization, ghost MoE diversification, and test-time adversarial training.
arXiv Detail & Related papers (2025-04-15T01:59:47Z)
Feature Fusion Transferability Aware Transformer for Unsupervised Domain Adaptation [1.9035011984138845]
Unsupervised domain adaptation (UDA) aims to leverage the knowledge learned from labeled source domains to improve performance on unlabeled target domains. Recent research has shown promise in applying Vision Transformers (ViTs) to this task. We propose a novel Feature Fusion Transferability Aware Transformer (FFTAT) to enhance ViT performance in UDA tasks.
arXiv Detail & Related papers (2024-11-10T22:23:12Z)
Transferable-guided Attention Is All You Need for Video Domain Adaptation [42.642008092347986]
Unsupervised adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques. Our key idea is to use transformer layers as a feature encoder and incorporate spatial and temporal transferability relationships into the attention mechanism. A Transferable-guided Attention (TransferAttn) framework is then developed to exploit the capacity of the transformer to adapt cross-domain knowledge.
arXiv Detail & Related papers (2024-07-01T15:29:27Z)
Vision Transformer-based Adversarial Domain Adaptation [5.611768906855499]
Vision transformer (ViT) has attracted tremendous attention since its emergence and has been widely used in various computer vision tasks. In this paper, we fill this gap by employing the ViT as the feature extractor in adversarial domain adaptation. We empirically demonstrate that ViT can be a plug-and-play component in adversarial domain adaptation.
arXiv Detail & Related papers (2024-04-24T11:41:28Z)
Improving Source-Free Target Adaptation with Vision Transformers Leveraging Domain Representation Images [8.626222763097335]
Unsupervised Domain Adaptation (UDA) methods facilitate knowledge transfer from a labeled source domain to an unlabeled target domain. This paper presents an innovative method to bolster ViT performance in source-free target adaptation, beginning with an evaluation of how key, query, and value elements affect ViT outcomes. Domain Representation Images (DRIs) act as domain-specific markers, effortlessly merging with the training regimen.
arXiv Detail & Related papers (2023-11-21T13:26:13Z)
Deeper Insights into ViTs Robustness towards Common Corruptions [82.79764218627558]
We investigate how CNN-like architectural designs and CNN-based data augmentation strategies impact on ViTs' robustness towards common corruptions. We demonstrate that overlapping patch embedding and convolutional Feed-Forward Network (FFN) boost performance on robustness. We also introduce a novel conditional method enabling input-varied augmentations from two angles.
arXiv Detail & Related papers (2022-04-26T08:22:34Z)
Safe Self-Refinement for Transformer-based Domain Adaptation [73.8480218879]
Unsupervised Domain Adaptation (UDA) aims to leverage a label-rich source domain to solve tasks on a related unlabeled target domain. It is a challenging problem especially when a large domain gap lies between the source and target domains. We propose a novel solution named SSRT (Safe Self-Refinement for Transformer-based domain adaptation), which brings improvement from two aspects.
arXiv Detail & Related papers (2022-04-16T00:15:46Z)
ConvNets vs. Transformers: Whose Visual Representations are More Transferable? [49.62201738334348]
We investigate the transfer learning ability of ConvNets and vision transformers in 15 single-task and multi-task performance evaluations. We observe consistent advantages of Transformer-based backbones on 13 downstream tasks.
arXiv Detail & Related papers (2021-08-11T16:20:38Z)
On Improving Adversarial Transferability of Vision Transformers [97.17154635766578]
Vision transformers (ViTs) process input images as sequences of patches via self-attention. We study the adversarial feature space of ViT models and their transferability. We introduce two novel strategies specific to the architecture of ViT models.
arXiv Detail & Related papers (2021-06-08T08:20:38Z)
Transformer-Based Source-Free Domain Adaptation [134.67078085569017]
We study the task of source-free domain adaptation (SFDA), where the source data are not available during target adaptation. We propose a generic and effective framework based on Transformer, named TransDA, for learning a generalized model for SFDA.
arXiv Detail & Related papers (2021-05-28T23:06:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.