TFS-ViT: Token-Level Feature Stylization for Domain Generalization
- URL: http://arxiv.org/abs/2303.15698v3
- Date: Sat, 16 Mar 2024 04:21:44 GMT
- Title: TFS-ViT: Token-Level Feature Stylization for Domain Generalization
- Authors: Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Gustavo A. Vargas Hakim, David Osowiechi, Ismail Ben Ayed, Christian Desrosiers,
- Abstract summary: Vision Transformers (ViTs) have shown outstanding performance for a broad range of computer vision tasks.
This paper presents a first Token-level Feature Stylization (TFS-ViT) approach for domain generalization.
Our approach transforms token features by mixing the normalization statistics of images from different domains.
- Score: 17.82872117103924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard deep learning models such as convolutional neural networks (CNNs) lack the ability of generalizing to domains which have not been seen during training. This problem is mainly due to the common but often wrong assumption of such models that the source and target data come from the same i.i.d. distribution. Recently, Vision Transformers (ViTs) have shown outstanding performance for a broad range of computer vision tasks. However, very few studies have investigated their ability to generalize to new domains. This paper presents a first Token-level Feature Stylization (TFS-ViT) approach for domain generalization, which improves the performance of ViTs to unseen data by synthesizing new domains. Our approach transforms token features by mixing the normalization statistics of images from different domains. We further improve this approach with a novel strategy for attention-aware stylization, which uses the attention maps of class (CLS) tokens to compute and mix normalization statistics of tokens corresponding to different image regions. The proposed method is flexible to the choice of backbone model and can be easily applied to any ViT-based architecture with a negligible increase in computational complexity. Comprehensive experiments show that our approach is able to achieve state-of-the-art performance on five challenging benchmarks for domain generalization, and demonstrate its ability to deal with different types of domain shifts. The implementation is available at: https://github.com/Mehrdad-Noori/TFS-ViT_Token-level_Feature_Stylization.
Related papers
- START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation [27.301312891532277]
Domain Generalization (DG) aims to enable models to generalize to unseen target domains by learning from multiple source domains.
We propose START, which achieves state-of-the-art (SOTA) performances and offers a competitive alternative to CNNs and ViTs.
Our START can selectively perturb and suppress domain-specific features in salient tokens within the input-dependent matrices of SSMs, thus effectively reducing the discrepancy between different domains.
arXiv Detail & Related papers (2024-10-21T13:50:32Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization [85.18995948334592]
Single domain generalization (single DG) aims at learning a robust model generalizable to unseen domains from only one training domain.
State-of-the-art approaches have mostly relied on data augmentations, such as adversarial perturbation and style enhancement, to synthesize new data.
We propose emphStyDeSty, which explicitly accounts for the alignment of the source and pseudo domains in the process of data augmentation.
arXiv Detail & Related papers (2024-06-01T02:41:34Z) - Learning Domain Invariant Prompt for Vision-Language Models [31.581652862478965]
We propose a novel prompt learning paradigm that directly generates emphdomain invariant prompt that can be generalized to unseen domains, called MetaPrompt.
Our method consistently and significantly outperforms existing methods.
arXiv Detail & Related papers (2022-12-08T11:23:24Z) - Vision Transformers: From Semantic Segmentation to Dense Prediction [139.15562023284187]
We explore the global context learning potentials of vision transformers (ViTs) for dense visual prediction.
Our motivation is that through learning global context at full receptive field layer by layer, ViTs may capture stronger long-range dependency information.
We formulate a family of Hierarchical Local-Global (HLG) Transformers, characterized by local attention within windows and global-attention across windows in a pyramidal architecture.
arXiv Detail & Related papers (2022-07-19T15:49:35Z) - Exploring Data Aggregation and Transformations to Generalize across
Visual Domains [0.0]
This thesis contributes to research on Domain Generalization (DG), Domain Adaptation (DA) and their variations.
We propose new frameworks for Domain Generalization and Domain Adaptation which make use of feature aggregation strategies and visual transformations.
We show how our proposed solutions outperform competitive state-of-the-art approaches in established DG and DA benchmarks.
arXiv Detail & Related papers (2021-08-20T14:58:14Z) - MixStyle Neural Networks for Domain Generalization and Adaptation [122.36901703868321]
MixStyle is a plug-and-play module that can improve domain generalization performance without the need to collect more data or increase model capacity.
Our experiments show that MixStyle can significantly boost out-of-distribution generalization performance across a wide range of tasks including image recognition, instance retrieval and reinforcement learning.
arXiv Detail & Related papers (2021-07-05T14:29:19Z) - Exploring Complementary Strengths of Invariant and Equivariant
Representations for Few-Shot Learning [96.75889543560497]
In many real-world problems, collecting a large number of labeled samples is infeasible.
Few-shot learning is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples.
We propose a novel training mechanism that simultaneously enforces equivariance and invariance to a general set of geometric transformations.
arXiv Detail & Related papers (2021-03-01T21:14:33Z) - Robust Domain-Free Domain Generalization with Class-aware Alignment [4.442096198968069]
Domain-Free Domain Generalization (DFDG) is a model-agnostic method to achieve better generalization performance on the unseen test domain.
DFDG uses novel strategies to learn domain-invariant class-discriminative features.
It obtains competitive performance on both time series sensor and image classification public datasets.
arXiv Detail & Related papers (2021-02-17T17:46:06Z) - Cross-Domain Facial Expression Recognition: A Unified Evaluation
Benchmark and Adversarial Graph Learning [85.6386289476598]
We develop a novel adversarial graph representation adaptation (AGRA) framework for cross-domain holistic-local feature co-adaptation.
We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T15:00:31Z) - A Universal Representation Transformer Layer for Few-Shot Image
Classification [43.31379752656756]
Few-shot classification aims to recognize unseen classes when presented with only a small number of samples.
We consider the problem of multi-domain few-shot image classification, where unseen classes and examples come from diverse data sources.
Here, we propose a Universal Representation Transformer layer, that meta-learns to leverage universal features for few-shot classification.
arXiv Detail & Related papers (2020-06-21T03:08:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.