Related papers: START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation

START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation

URL: http://arxiv.org/abs/2410.16020v1
Date: Mon, 21 Oct 2024 13:50:32 GMT
Title: START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation
Authors: Jintao Guo, Lei Qi, Yinghuan Shi, Yang Gao,
Abstract summary: Domain Generalization (DG) aims to enable models to generalize to unseen target domains by learning from multiple source domains. We propose START, which achieves state-of-the-art (SOTA) performances and offers a competitive alternative to CNNs and ViTs. Our START can selectively perturb and suppress domain-specific features in salient tokens within the input-dependent matrices of SSMs, thus effectively reducing the discrepancy between different domains.
Score: 27.301312891532277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Domain Generalization (DG) aims to enable models to generalize to unseen target domains by learning from multiple source domains. Existing DG methods primarily rely on convolutional neural networks (CNNs), which inherently learn texture biases due to their limited receptive fields, making them prone to overfitting source domains. While some works have introduced transformer-based methods (ViTs) for DG to leverage the global receptive field, these methods incur high computational costs due to the quadratic complexity of self-attention. Recently, advanced state space models (SSMs), represented by Mamba, have shown promising results in supervised learning tasks by achieving linear complexity in sequence length during training and fast RNN-like computation during inference. Inspired by this, we investigate the generalization ability of the Mamba model under domain shifts and find that input-dependent matrices within SSMs could accumulate and amplify domain-specific features, thus hindering model generalization. To address this issue, we propose a novel SSM-based architecture with saliency-based token-aware transformation (namely START), which achieves state-of-the-art (SOTA) performances and offers a competitive alternative to CNNs and ViTs. Our START can selectively perturb and suppress domain-specific features in salient tokens within the input-dependent matrices of SSMs, thus effectively reducing the discrepancy between different domains. Extensive experiments on five benchmarks demonstrate that START outperforms existing SOTA DG methods with efficient linear complexity. Our code is available at https://github.com/lingeringlight/START.

Related papers

Domain Generalized Stereo Matching with Uncertainty-guided Data Augmentation [11.938635624781313]
State-of-the-art stereo matching (SM) models often fail to generalize to real data domains due to domain differences.<n>We leverage data augmentation to expand the training domain, encouraging the model to acquire robust cross-domain feature representations.<n>Our approach is simple, architecture-agnostic, and can be integrated into any SM networks.
arXiv Detail & Related papers (2025-08-02T10:26:53Z)
Let Synthetic Data Shine: Domain Reassembly and Soft-Fusion for Single Domain Generalization [68.41367635546183]
Single Domain Generalization aims to train models with consistent performance across diverse scenarios using data from a single source. We propose Discriminative Domain Reassembly and Soft-Fusion (DRSF), a training framework leveraging synthetic data to improve model generalization.
arXiv Detail & Related papers (2025-03-17T18:08:03Z)
Multisource Collaborative Domain Generalization for Cross-Scene Remote Sensing Image Classification [57.945437355714155]
Cross-scene image classification aims to transfer prior knowledge of ground materials to annotate regions with different distributions. Existing approaches focus on single-source domain generalization to unseen target domains. We propose a novel multi-source collaborative domain generalization framework (MS-CDG) based on homogeneity and heterogeneity characteristics of multi-source remote sensing data.
arXiv Detail & Related papers (2024-12-05T06:15:08Z)
Domain-Guided Weight Modulation for Semi-Supervised Domain Generalization [11.392783918495404]
We study the challenging problem of semi-supervised domain generalization. The goal is to learn a domain-generalizable model while using only a small fraction of labeled data and a relatively large fraction of unlabeled data. We propose a novel method that can facilitate the generation of accurate pseudo-labels under various domain shifts.
arXiv Detail & Related papers (2024-09-04T01:26:23Z)
PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model [77.00221501105788]
Domain Generalization (DG) has been recently explored to improve the generalizability of point cloud classification (PCC) models toward unseen domains. We present the first work that studies the generalizability of state space models (SSMs) in DG PCC. We propose a novel framework, PointDGMamba, that excels in strong generalizability toward unseen domains.
arXiv Detail & Related papers (2024-08-24T12:53:48Z)
Disentangling Masked Autoencoders for Unsupervised Domain Generalization [57.56744870106124]
Unsupervised domain generalization is fast gaining attention but is still far from well-studied. Disentangled Masked Auto (DisMAE) aims to discover the disentangled representations that faithfully reveal intrinsic features. DisMAE co-trains the asymmetric dual-branch architecture with semantic and lightweight variation encoders.
arXiv Detail & Related papers (2024-07-10T11:11:36Z)
Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation [59.41178047749177]
We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training. We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling. We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE.
arXiv Detail & Related papers (2024-07-01T09:45:22Z)
StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization [85.18995948334592]
Single domain generalization (single DG) aims at learning a robust model generalizable to unseen domains from only one training domain. State-of-the-art approaches have mostly relied on data augmentations, such as adversarial perturbation and style enhancement, to synthesize new data. We propose emphStyDeSty, which explicitly accounts for the alignment of the source and pseudo domains in the process of data augmentation.
arXiv Detail & Related papers (2024-06-01T02:41:34Z)
DGMamba: Domain Generalization via Generalized State Space Model [80.82253601531164]
Domain generalization(DG) aims at solving distribution shift problems in various scenes. Mamba, as an emerging state space model (SSM), possesses superior linear complexity and global receptive fields. We propose a novel framework for DG, named DGMamba, that excels in strong generalizability toward unseen domains.
arXiv Detail & Related papers (2024-04-11T14:35:59Z)
SETA: Semantic-Aware Token Augmentation for Domain Generalization [27.301312891532277]
Domain generalization (DG) aims to enhance the model against domain shifts without accessing target domains. Prior CNN-based augmentation methods on token-based models are suboptimal due to the lack of incentivizing the model to learn holistic shape information. We propose the SEmantic-aware Token Augmentation (SETA) method, which transforms features by perturbing local edge cues while preserving global shape features.
arXiv Detail & Related papers (2024-03-18T13:50:35Z)
CNN Feature Map Augmentation for Single-Source Domain Generalization [6.053629733936548]
Domain Generalization (DG) has gained significant traction during the past few years. The goal in DG is to produce models which continue to perform well when presented with data distributions different from the ones available during training. We propose an alternative regularization technique for convolutional neural network architectures in the single-source DG image classification setting.
arXiv Detail & Related papers (2023-05-26T08:48:17Z)
TFS-ViT: Token-Level Feature Stylization for Domain Generalization [17.82872117103924]
Vision Transformers (ViTs) have shown outstanding performance for a broad range of computer vision tasks. This paper presents a first Token-level Feature Stylization (TFS-ViT) approach for domain generalization. Our approach transforms token features by mixing the normalization statistics of images from different domains.
arXiv Detail & Related papers (2023-03-28T03:00:28Z)
Learning to Augment via Implicit Differentiation for Domain Generalization [107.9666735637355]
Domain generalization (DG) aims to overcome the problem by leveraging multiple source domains to learn a domain-generalizable model. In this paper, we propose a novel augmentation-based DG approach, dubbed AugLearn. AugLearn shows effectiveness on three standard DG benchmarks, PACS, Office-Home and Digits-DG.
arXiv Detail & Related papers (2022-10-25T18:51:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.