Adaptive Frequency Filters As Efficient Global Token Mixers
- URL: http://arxiv.org/abs/2307.14008v1
- Date: Wed, 26 Jul 2023 07:42:28 GMT
- Title: Adaptive Frequency Filters As Efficient Global Token Mixers
- Authors: Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu,
Baining Guo
- Abstract summary: We show that adaptive frequency filters can serve as efficient global token mixers.
We take AFF token mixers as primary neural operators to build a lightweight neural network, dubbed AFFNet.
- Score: 100.27957692579892
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent vision transformers, large-kernel CNNs and MLPs have attained
remarkable successes in broad vision tasks thanks to their effective
information fusion in the global scope. However, their efficient deployments,
especially on mobile devices, still suffer from noteworthy challenges due to
the heavy computational costs of self-attention mechanisms, large kernels, or
fully connected layers. In this work, we apply conventional convolution theorem
to deep learning for addressing this and reveal that adaptive frequency filters
can serve as efficient global token mixers. With this insight, we propose
Adaptive Frequency Filtering (AFF) token mixer. This neural operator transfers
a latent representation to the frequency domain via a Fourier transform and
performs semantic-adaptive frequency filtering via an elementwise
multiplication, which mathematically equals to a token mixing operation in the
original latent space with a dynamic convolution kernel as large as the spatial
resolution of this latent representation. We take AFF token mixers as primary
neural operators to build a lightweight neural network, dubbed AFFNet.
Extensive experiments demonstrate the effectiveness of our proposed AFF token
mixer and show that AFFNet achieve superior accuracy and efficiency trade-offs
compared to other lightweight network designs on broad visual tasks, including
visual recognition and dense prediction tasks.
Related papers
- Band-Attention Modulated RetNet for Face Forgery Detection [44.0511745071837]
transformer networks are extensively utilized in face forgery detection due to their scalability across large datasets.
We introduce Band-Attention modulated RetNet (BAR-Net), a lightweight network designed to efficiently process extensive visual contexts.
We present the adaptive frequency Band-Attention Modulation mechanism, which treats the entire Discrete Cosine Transform spectrogram as a series of frequency bands with learnable weights.
arXiv Detail & Related papers (2024-04-09T05:11:28Z) - TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic
Token Mixer for Visual Recognition [71.6546914957701]
We propose a lightweight Dual Dynamic Token Mixer (D-Mixer) that aggregates global information and local details in an input-dependent way.
We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN-Transformer vision backbone network.
In the ImageNet-1K image classification task, TransXNet-T surpasses Swin-T by 0.3% in top-1 accuracy while requiring less than half of the computational cost.
arXiv Detail & Related papers (2023-10-30T09:35:56Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - Multiscale Attention via Wavelet Neural Operators for Vision
Transformers [0.0]
Transformers have achieved widespread success in computer vision. At their heart, there is a Self-Attention (SA) mechanism.
Standard SA mechanism has quadratic complexity with the sequence length, which impedes its utility to long sequences appearing in high resolution vision.
We introduce a Multiscale Wavelet Attention (MWA) by leveraging wavelet neural operators which incurs linear complexity in the sequence size.
arXiv Detail & Related papers (2023-03-22T09:06:07Z) - Efficient Frequency Domain-based Transformers for High-Quality Image
Deblurring [39.720032882926176]
We present an effective and efficient method that explores the properties of Transformers in the frequency domain for high-quality image deblurring.
We formulate the proposed FSAS and DFFN into an asymmetrical network based on an encoder and decoder architecture.
arXiv Detail & Related papers (2022-11-22T13:08:03Z) - Deep Frequency Filtering for Domain Generalization [55.66498461438285]
Deep Neural Networks (DNNs) have preferences for some frequency components in the learning process.
We propose Deep Frequency Filtering (DFF) for learning domain-generalizable features.
We show that applying our proposed DFF on a plain baseline outperforms the state-of-the-art methods on different domain generalization tasks.
arXiv Detail & Related papers (2022-03-23T05:19:06Z) - Adaptive Fourier Neural Operators: Efficient Token Mixers for
Transformers [55.90468016961356]
We propose an efficient token mixer that learns to mix in the Fourier domain.
AFNO is based on a principled foundation of operator learning.
It can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms.
arXiv Detail & Related papers (2021-11-24T05:44:31Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.