Related papers: SETA: Semantic-Aware Token Augmentation for Domain Generalization

SETA: Semantic-Aware Token Augmentation for Domain Generalization

URL: http://arxiv.org/abs/2403.11792v2
Date: Mon, 21 Oct 2024 14:42:16 GMT
Title: SETA: Semantic-Aware Token Augmentation for Domain Generalization
Authors: Jintao Guo, Lei Qi, Yinghuan Shi, Yang Gao,
Abstract summary: Domain generalization (DG) aims to enhance the model against domain shifts without accessing target domains. Prior CNN-based augmentation methods on token-based models are suboptimal due to the lack of incentivizing the model to learn holistic shape information. We propose the SEmantic-aware Token Augmentation (SETA) method, which transforms features by perturbing local edge cues while preserving global shape features.
Score: 27.301312891532277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Domain generalization (DG) aims to enhance the model robustness against domain shifts without accessing target domains. A prevalent category of methods for DG is data augmentation, which focuses on generating virtual samples to simulate domain shifts. However, existing augmentation techniques in DG are mainly tailored for convolutional neural networks (CNNs), with limited exploration in token-based architectures, i.e., vision transformer (ViT) and multi-layer perceptrons (MLP) models. In this paper, we study the impact of prior CNN-based augmentation methods on token-based models, revealing their performance is suboptimal due to the lack of incentivizing the model to learn holistic shape information. To tackle the issue, we propose the SEmantic-aware Token Augmentation (SETA) method. SETA transforms token features by perturbing local edge cues while preserving global shape features, thereby enhancing the model learning of shape information. To further enhance the generalization ability of the model, we introduce two stylized variants of our method combined with two state-of-the-art style augmentation methods in DG. We provide a theoretical insight into our method, demonstrating its effectiveness in reducing the generalization risk bound. Comprehensive experiments on five benchmarks prove that our method achieves SOTA performances across various ViT and MLP architectures. Our code is available at https://github.com/lingeringlight/SETA.

Related papers

Object Style Diffusion for Generalized Object Detection in Urban Scene [69.04189353993907]
We introduce a novel single-domain object detection generalization method, named GoDiff. By integrating pseudo-target domain data with source domain data, we diversify the training dataset. Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods.
arXiv Detail & Related papers (2024-12-18T13:03:00Z)
START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation [27.301312891532277]
Domain Generalization (DG) aims to enable models to generalize to unseen target domains by learning from multiple source domains. We propose START, which achieves state-of-the-art (SOTA) performances and offers a competitive alternative to CNNs and ViTs. Our START can selectively perturb and suppress domain-specific features in salient tokens within the input-dependent matrices of SSMs, thus effectively reducing the discrepancy between different domains.
arXiv Detail & Related papers (2024-10-21T13:50:32Z)
Consistency Regularization for Generalizable Source-free Domain Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset. Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets. We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z)
ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation [48.039156140237615]
A Continual Test-Time Adaptation task is proposed to adapt the pre-trained model to continually changing target domains. We design a Visual Domain Adapter (ViDA) for CTTA, explicitly handling both domain-specific and domain-shared knowledge. Our proposed method achieves state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-06-07T11:18:53Z)
CNN Feature Map Augmentation for Single-Source Domain Generalization [6.053629733936548]
Domain Generalization (DG) has gained significant traction during the past few years. The goal in DG is to produce models which continue to perform well when presented with data distributions different from the ones available during training. We propose an alternative regularization technique for convolutional neural network architectures in the single-source DG image classification setting.
arXiv Detail & Related papers (2023-05-26T08:48:17Z)
Learning to Augment via Implicit Differentiation for Domain Generalization [107.9666735637355]
Domain generalization (DG) aims to overcome the problem by leveraging multiple source domains to learn a domain-generalizable model. In this paper, we propose a novel augmentation-based DG approach, dubbed AugLearn. AugLearn shows effectiveness on three standard DG benchmarks, PACS, Office-Home and Digits-DG.
arXiv Detail & Related papers (2022-10-25T18:51:51Z)
Federated Domain Generalization for Image Recognition via Cross-Client Style Transfer [60.70102634957392]
Domain generalization (DG) has been a hot topic in image recognition, with a goal to train a general model that can perform well on unseen domains. In this paper, we propose a novel domain generalization method for image recognition through cross-client style transfer (CCST) without exchanging data samples. Our method outperforms recent SOTA DG methods on two DG benchmarks (PACS, OfficeHome) and a large-scale medical image dataset (Camelyon17) in the FL setting.
arXiv Detail & Related papers (2022-10-03T13:15:55Z)
Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder [23.860842627883187]
We teach the model to capture broader variations of the feature distributions with a novel noise-enhanced supervised autoencoder (NSAE) NSAE trains the model by jointly reconstructing inputs and predicting the labels of inputs as well as their reconstructed pairs. We also take advantage of NSAE structure and propose a two-step fine-tuning procedure that achieves better adaption and improves classification performance in the target domain.
arXiv Detail & Related papers (2021-08-11T04:45:56Z)
Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area. Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos. This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
Learning Meta Face Recognition in Unseen Domains [74.69681594452125]
We propose a novel face recognition method via meta-learning named Meta Face Recognition (MFR) MFR synthesizes the source/target domain shift with a meta-optimization objective. We propose two benchmarks for generalized face recognition evaluation.
arXiv Detail & Related papers (2020-03-17T14:10:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.