Scaling Up Computer Vision Neural Networks Using Fast Fourier Transform
- URL: http://arxiv.org/abs/2302.12185v1
- Date: Thu, 2 Feb 2023 19:19:10 GMT
- Title: Scaling Up Computer Vision Neural Networks Using Fast Fourier Transform
- Authors: Siddharth Agrawal
- Abstract summary: Deep Learning-based Computer Vision field has recently been trying to explore larger kernels for convolution to effectively scale up Convolutional Neural Networks.
New paradigm of models such as Vision Transformers find it difficult to scale up to larger higher resolution images due to their quadratic complexity in terms of input sequence.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Learning-based Computer Vision field has recently been trying to explore
larger kernels for convolution to effectively scale up Convolutional Neural
Networks. Simultaneously, new paradigm of models such as Vision Transformers
find it difficult to scale up to larger higher resolution images due to their
quadratic complexity in terms of input sequence. In this report, Fast Fourier
Transform is utilised in various ways to provide some solutions to these
issues.
Related papers
- Optimizing Vision Transformers for Medical Image Segmentation and
Few-Shot Domain Adaptation [11.690799827071606]
We propose Convolutional Swin-Unet (CS-Unet) transformer blocks and optimise their settings with relation to patch embedding, projection, the feed-forward network, up sampling and skip connections.
CS-Unet can be trained from scratch and inherits the superiority of convolutions in each feature process phase.
Experiments show that CS-Unet without pre-training surpasses other state-of-the-art counterparts by large margins on two medical CT and MRI datasets with fewer parameters.
arXiv Detail & Related papers (2022-10-14T19:18:52Z) - Video Frame Interpolation with Transformer [55.12620857638253]
We introduce a novel framework, which takes advantage of Transformer to model long-range pixel correlation among video frames.
Our network is equipped with a novel cross-scale window-based attention mechanism, where cross-scale windows interact with each other.
arXiv Detail & Related papers (2022-05-15T09:30:28Z) - Batch Processing and Data Streaming Fourier-based Convolutional Neural
Network Accelerator [4.7257913147626995]
Decision-making by artificial neural networks with minimal latency is paramount for numerous applications such as navigation, tracking, and real-time machine action systems.
This requires the machine learning hardware to handle multidimensional data with a high throughput.
We demonstrate a non-van Neuman-based machine learning acceleration with a Fourier Convolutional Neural Network (FCNN) accelerator.
arXiv Detail & Related papers (2021-12-23T01:06:17Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.