Adaptive Fourier Neural Operators: Efficient Token Mixers for
Transformers
- URL: http://arxiv.org/abs/2111.13587v1
- Date: Wed, 24 Nov 2021 05:44:31 GMT
- Title: Adaptive Fourier Neural Operators: Efficient Token Mixers for
Transformers
- Authors: John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar,
Bryan Catanzaro
- Abstract summary: We propose an efficient token mixer that learns to mix in the Fourier domain.
AFNO is based on a principled foundation of operator learning.
It can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms.
- Score: 55.90468016961356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision transformers have delivered tremendous success in representation
learning. This is primarily due to effective token mixing through self
attention. However, this scales quadratically with the number of pixels, which
becomes infeasible for high-resolution inputs. To cope with this challenge, we
propose Adaptive Fourier Neural Operator (AFNO) as an efficient token mixer
that learns to mix in the Fourier domain. AFNO is based on a principled
foundation of operator learning which allows us to frame token mixing as a
continuous global convolution without any dependence on the input resolution.
This principle was previously used to design FNO, which solves global
convolution efficiently in the Fourier domain and has shown promise in learning
challenging PDEs. To handle challenges in visual representation learning such
as discontinuities in images and high resolution inputs, we propose principled
architectural modifications to FNO which results in memory and computational
efficiency. This includes imposing a block-diagonal structure on the channel
mixing weights, adaptively sharing weights across tokens, and sparsifying the
frequency modes via soft-thresholding and shrinkage. The resulting model is
highly parallel with a quasi-linear complexity and has linear memory in the
sequence size. AFNO outperforms self-attention mechanisms for few-shot
segmentation in terms of both efficiency and accuracy. For Cityscapes
segmentation with the Segformer-B3 backbone, AFNO can handle a sequence size of
65k and outperforms other efficient self-attention mechanisms.
Related papers
- LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation [64.34935748707673]
Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors.
We propose a novel method of Learning Resampling (termed LeRF) which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption.
LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the shapes of these resampling functions with a neural network.
arXiv Detail & Related papers (2024-07-13T16:09:45Z) - Invertible Fourier Neural Operators for Tackling Both Forward and
Inverse Problems [18.48295539583625]
We propose an invertible Fourier Neural Operator (iFNO) that tackles both the forward and inverse problems.
We integrated a variational auto-encoder to capture the intrinsic structures within the input space and to enable posterior inference.
The evaluations on five benchmark problems have demonstrated the effectiveness of our approach.
arXiv Detail & Related papers (2024-02-18T22:16:43Z) - Token Fusion: Bridging the Gap between Token Pruning and Token Merging [71.84591084401458]
Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs.
computational overhead, largely attributed to the self-attention mechanism, makes deployment on resource-constrained edge devices challenging.
We introduce "Token Fusion" (ToFu), a method that amalgamates the benefits of both token pruning and token merging.
arXiv Detail & Related papers (2023-12-02T04:29:19Z) - Adaptive Frequency Filters As Efficient Global Token Mixers [100.27957692579892]
We show that adaptive frequency filters can serve as efficient global token mixers.
We take AFF token mixers as primary neural operators to build a lightweight neural network, dubbed AFFNet.
arXiv Detail & Related papers (2023-07-26T07:42:28Z) - Multiscale Attention via Wavelet Neural Operators for Vision
Transformers [0.0]
Transformers have achieved widespread success in computer vision. At their heart, there is a Self-Attention (SA) mechanism.
Standard SA mechanism has quadratic complexity with the sequence length, which impedes its utility to long sequences appearing in high resolution vision.
We introduce a Multiscale Wavelet Attention (MWA) by leveraging wavelet neural operators which incurs linear complexity in the sequence size.
arXiv Detail & Related papers (2023-03-22T09:06:07Z) - Efficient Frequency Domain-based Transformers for High-Quality Image
Deblurring [39.720032882926176]
We present an effective and efficient method that explores the properties of Transformers in the frequency domain for high-quality image deblurring.
We formulate the proposed FSAS and DFFN into an asymmetrical network based on an encoder and decoder architecture.
arXiv Detail & Related papers (2022-11-22T13:08:03Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.