ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency
Transform for Domain Generalization
- URL: http://arxiv.org/abs/2303.11674v2
- Date: Fri, 31 Mar 2023 11:55:55 GMT
- Title: ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency
Transform for Domain Generalization
- Authors: Jintao Guo, Na Wang, Lei Qi, Yinghuan Shi
- Abstract summary: Domain Domain (DG) aims to learn a model that generalizes well to unseen target domains utilizing multiple source domains without re-training.
Most existing DG works are based on convolutional neural networks (CNNs)
- Score: 15.057335610188545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain generalization (DG) aims to learn a model that generalizes well to
unseen target domains utilizing multiple source domains without re-training.
Most existing DG works are based on convolutional neural networks (CNNs).
However, the local operation of the convolution kernel makes the model focus
too much on local representations (e.g., texture), which inherently causes the
model more prone to overfit to the source domains and hampers its
generalization ability. Recently, several MLP-based methods have achieved
promising results in supervised learning tasks by learning global interactions
among different patches of the image. Inspired by this, in this paper, we first
analyze the difference between CNN and MLP methods in DG and find that MLP
methods exhibit a better generalization ability because they can better capture
the global representations (e.g., structure) than CNN methods. Then, based on a
recent lightweight MLP method, we obtain a strong baseline that outperforms
most state-of-the-art CNN-based methods. The baseline can learn global
structure representations with a filter to suppress structure irrelevant
information in the frequency space. Moreover, we propose a dynAmic
LOw-Frequency spectrum Transform (ALOFT) that can perturb local texture
features while preserving global structure features, thus enabling the filter
to remove structure-irrelevant information sufficiently. Extensive experiments
on four benchmarks have demonstrated that our method can achieve great
performance improvement with a small number of parameters compared to SOTA
CNN-based DG methods. Our code is available at
https://github.com/lingeringlight/ALOFT/.
Related papers
- Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs)
We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - Keypoint-Augmented Self-Supervised Learning for Medical Image
Segmentation with Limited Annotation [21.203307064937142]
We present a keypointaugmented fusion layer that extracts representations preserving both short- and long-range self-attention.
In particular, we augment the CNN feature map at multiple scales by incorporating an additional input that learns long-range spatial selfattention.
Our method further outperforms existing SSL methods by producing more robust self-attention.
arXiv Detail & Related papers (2023-10-02T22:31:30Z) - CNN Feature Map Augmentation for Single-Source Domain Generalization [6.053629733936548]
Domain Generalization (DG) has gained significant traction during the past few years.
The goal in DG is to produce models which continue to perform well when presented with data distributions different from the ones available during training.
We propose an alternative regularization technique for convolutional neural network architectures in the single-source DG image classification setting.
arXiv Detail & Related papers (2023-05-26T08:48:17Z) - Improving Convolutional Neural Networks for Fault Diagnosis by
Assimilating Global Features [0.0]
This paper proposes a novel local-global CNN architecture that accounts for both local and global features for fault diagnosis.
The proposed LG-CNN can greatly improve the fault diagnosis performance without significantly increasing the model complexity.
arXiv Detail & Related papers (2022-10-03T16:49:16Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z) - RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [113.1414517605892]
We propose a methodology, Locality Injection, to incorporate local priors into an FC layer.
RepMLPNet is the first that seamlessly transfer to Cityscapes semantic segmentation.
arXiv Detail & Related papers (2021-12-21T10:28:17Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Video Salient Object Detection via Adaptive Local-Global Refinement [7.723369608197167]
Video salient object detection (VSOD) is an important task in many vision applications.
We propose an adaptive local-global refinement framework for VSOD.
We show that our weighting methodology can further exploit the feature correlations, thus driving the network to learn more discriminative feature representation.
arXiv Detail & Related papers (2021-04-29T14:14:11Z) - Learning to Generalize Unseen Domains via Memory-based Multi-Source
Meta-Learning for Person Re-Identification [59.326456778057384]
We propose the Memory-based Multi-Source Meta-Learning framework to train a generalizable model for unseen domains.
We also present a meta batch normalization layer (MetaBN) to diversify meta-test features.
Experiments demonstrate that our M$3$L can effectively enhance the generalization ability of the model for unseen domains.
arXiv Detail & Related papers (2020-12-01T11:38:16Z) - Learning Meta Face Recognition in Unseen Domains [74.69681594452125]
We propose a novel face recognition method via meta-learning named Meta Face Recognition (MFR)
MFR synthesizes the source/target domain shift with a meta-optimization objective.
We propose two benchmarks for generalized face recognition evaluation.
arXiv Detail & Related papers (2020-03-17T14:10:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.