FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization
- URL: http://arxiv.org/abs/2203.12893v1
- Date: Thu, 24 Mar 2022 07:26:29 GMT
- Title: FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization
- Authors: Kecheng Zheng, Yang Cao, Kai Zhu, Ruijing Zhao, Zheng-Jun Zha
- Abstract summary: We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
- Score: 73.41395947275473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: MLP-like models built entirely upon multi-layer perceptrons have recently
been revisited, exhibiting the comparable performance with transformers. It is
one of most promising architectures due to the excellent trade-off between
network capability and efficiency in the large-scale recognition tasks.
However, its generalization performance to heterogeneous tasks is inferior to
other architectures (e.g., CNNs and transformers) due to the extensive
retention of domain information. To address this problem, we propose a novel
frequency-aware MLP architecture, in which the domain-specific features are
filtered out in the transformed frequency domain, augmenting the invariant
descriptor for label prediction. Specifically, we design an adaptive Fourier
filter layer, in which a learnable frequency filter is utilized to adjust the
amplitude distribution by optimizing both the real and imaginary parts. A
low-rank enhancement module is further proposed to rectify the filtered
features by adding the low-frequency components from SVD decomposition.
Finally, a momentum update strategy is utilized to stabilize the optimization
to fluctuation of model parameters and inputs by the output distillation with
weighted historical states. To our best knowledge, we are the first to propose
a MLP-like backbone for domain generalization. Extensive experiments on three
benchmarks demonstrate significant generalization performance, outperforming
the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
Related papers
- FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation [50.9040167152168]
We experimentally quantify the contrast sensitivity function of CNNs and compare it with that of the human visual system.
We propose the Wavelet-Guided Spectral Pooling Module (WSPM) to enhance and balance image features across the frequency domain.
To further emulate the human visual system, we introduce the Frequency Domain Enhanced Receptive Field Block (FE-RFB)
We develop FE-UNet, a model that utilizes SAM2 as its backbone and incorporates Hiera-Large as a pre-trained block.
arXiv Detail & Related papers (2025-02-06T07:24:34Z) - FreqMixFormerV2: Lightweight Frequency-aware Mixed Transformer for Human Skeleton Action Recognition [9.963966059349731]
FreqMixForemrV2 is built upon the Frequency-aware Mixed Transformer (FreqMixFormer) for identifying subtle and discriminative actions.
The proposed model achieves a superior balance between efficiency and accuracy, outperforming state-of-the-art methods with only 60% of the parameters.
arXiv Detail & Related papers (2024-12-29T23:52:40Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution [32.29219284419944]
Cross-refinement adaptive feature modulation transformer (CRAFT)
We introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency.
Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods.
arXiv Detail & Related papers (2023-08-09T15:38:36Z) - Fourier Test-time Adaptation with Multi-level Consistency for Robust
Classification [10.291631977766672]
We propose a novel approach called Fourier Test-time Adaptation (FTTA) to integrate input and model tuning.
FTTA builds a reliable multi-level consistency measurement of paired inputs for achieving self-supervised of prediction.
It was extensively validated on three large classification datasets with different modalities and organs.
arXiv Detail & Related papers (2023-06-05T02:29:38Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Inception Transformer [151.939077819196]
Inception Transformer, or iFormer, learns comprehensive features with both high- and low-frequency information in visual data.
We benchmark the iFormer on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection and ADE20K segmentation.
arXiv Detail & Related papers (2022-05-25T17:59:54Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.