START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation
- URL: http://arxiv.org/abs/2410.16020v1
- Date: Mon, 21 Oct 2024 13:50:32 GMT
- Title: START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation
- Authors: Jintao Guo, Lei Qi, Yinghuan Shi, Yang Gao,
- Abstract summary: Domain Generalization (DG) aims to enable models to generalize to unseen target domains by learning from multiple source domains.
We propose START, which achieves state-of-the-art (SOTA) performances and offers a competitive alternative to CNNs and ViTs.
Our START can selectively perturb and suppress domain-specific features in salient tokens within the input-dependent matrices of SSMs, thus effectively reducing the discrepancy between different domains.
- Score: 27.301312891532277
- License:
- Abstract: Domain Generalization (DG) aims to enable models to generalize to unseen target domains by learning from multiple source domains. Existing DG methods primarily rely on convolutional neural networks (CNNs), which inherently learn texture biases due to their limited receptive fields, making them prone to overfitting source domains. While some works have introduced transformer-based methods (ViTs) for DG to leverage the global receptive field, these methods incur high computational costs due to the quadratic complexity of self-attention. Recently, advanced state space models (SSMs), represented by Mamba, have shown promising results in supervised learning tasks by achieving linear complexity in sequence length during training and fast RNN-like computation during inference. Inspired by this, we investigate the generalization ability of the Mamba model under domain shifts and find that input-dependent matrices within SSMs could accumulate and amplify domain-specific features, thus hindering model generalization. To address this issue, we propose a novel SSM-based architecture with saliency-based token-aware transformation (namely START), which achieves state-of-the-art (SOTA) performances and offers a competitive alternative to CNNs and ViTs. Our START can selectively perturb and suppress domain-specific features in salient tokens within the input-dependent matrices of SSMs, thus effectively reducing the discrepancy between different domains. Extensive experiments on five benchmarks demonstrate that START outperforms existing SOTA DG methods with efficient linear complexity. Our code is available at https://github.com/lingeringlight/START.
Related papers
- Domain-Guided Weight Modulation for Semi-Supervised Domain Generalization [11.392783918495404]
We study the challenging problem of semi-supervised domain generalization.
The goal is to learn a domain-generalizable model while using only a small fraction of labeled data and a relatively large fraction of unlabeled data.
We propose a novel method that can facilitate the generation of accurate pseudo-labels under various domain shifts.
arXiv Detail & Related papers (2024-09-04T01:26:23Z) - PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model [77.00221501105788]
Domain Generalization (DG) has been recently explored to improve the generalizability of point cloud classification (PCC) models toward unseen domains.
We present the first work that studies the generalizability of state space models (SSMs) in DG PCC.
We propose a novel framework, PointDGMamba, that excels in strong generalizability toward unseen domains.
arXiv Detail & Related papers (2024-08-24T12:53:48Z) - Disentangling Masked Autoencoders for Unsupervised Domain Generalization [57.56744870106124]
Unsupervised domain generalization is fast gaining attention but is still far from well-studied.
Disentangled Masked Auto (DisMAE) aims to discover the disentangled representations that faithfully reveal intrinsic features.
DisMAE co-trains the asymmetric dual-branch architecture with semantic and lightweight variation encoders.
arXiv Detail & Related papers (2024-07-10T11:11:36Z) - Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation [59.41178047749177]
We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training.
We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling.
We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE.
arXiv Detail & Related papers (2024-07-01T09:45:22Z) - StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization [85.18995948334592]
Single domain generalization (single DG) aims at learning a robust model generalizable to unseen domains from only one training domain.
State-of-the-art approaches have mostly relied on data augmentations, such as adversarial perturbation and style enhancement, to synthesize new data.
We propose emphStyDeSty, which explicitly accounts for the alignment of the source and pseudo domains in the process of data augmentation.
arXiv Detail & Related papers (2024-06-01T02:41:34Z) - DGMamba: Domain Generalization via Generalized State Space Model [80.82253601531164]
Domain generalization(DG) aims at solving distribution shift problems in various scenes.
Mamba, as an emerging state space model (SSM), possesses superior linear complexity and global receptive fields.
We propose a novel framework for DG, named DGMamba, that excels in strong generalizability toward unseen domains.
arXiv Detail & Related papers (2024-04-11T14:35:59Z) - SETA: Semantic-Aware Token Augmentation for Domain Generalization [27.301312891532277]
Domain generalization (DG) aims to enhance the model against domain shifts without accessing target domains.
Prior CNN-based augmentation methods on token-based models are suboptimal due to the lack of incentivizing the model to learn holistic shape information.
We propose the SEmantic-aware Token Augmentation (SETA) method, which transforms features by perturbing local edge cues while preserving global shape features.
arXiv Detail & Related papers (2024-03-18T13:50:35Z) - CNN Feature Map Augmentation for Single-Source Domain Generalization [6.053629733936548]
Domain Generalization (DG) has gained significant traction during the past few years.
The goal in DG is to produce models which continue to perform well when presented with data distributions different from the ones available during training.
We propose an alternative regularization technique for convolutional neural network architectures in the single-source DG image classification setting.
arXiv Detail & Related papers (2023-05-26T08:48:17Z) - TFS-ViT: Token-Level Feature Stylization for Domain Generalization [17.82872117103924]
Vision Transformers (ViTs) have shown outstanding performance for a broad range of computer vision tasks.
This paper presents a first Token-level Feature Stylization (TFS-ViT) approach for domain generalization.
Our approach transforms token features by mixing the normalization statistics of images from different domains.
arXiv Detail & Related papers (2023-03-28T03:00:28Z) - Learning to Augment via Implicit Differentiation for Domain
Generalization [107.9666735637355]
Domain generalization (DG) aims to overcome the problem by leveraging multiple source domains to learn a domain-generalizable model.
In this paper, we propose a novel augmentation-based DG approach, dubbed AugLearn.
AugLearn shows effectiveness on three standard DG benchmarks, PACS, Office-Home and Digits-DG.
arXiv Detail & Related papers (2022-10-25T18:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.