SpiralMLP: A Lightweight Vision MLP Architecture
- URL: http://arxiv.org/abs/2404.00648v2
- Date: Tue, 3 Sep 2024 10:20:38 GMT
- Title: SpiralMLP: A Lightweight Vision MLP Architecture
- Authors: Haojie Mu, Burhan Ul Tayyab, Nicholas Chua,
- Abstract summary: We present SpiralMLP, a novel architecture that introduces a Spiral FC layer as a replacement for the conventional Token Mixing approach.
Our study reveals that targeting the full receptive field is not essential for achieving high performance, instead, adopting a refined approach offers better results.
- Score: 0.27309692684728615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present SpiralMLP, a novel architecture that introduces a Spiral FC layer as a replacement for the conventional Token Mixing approach. Differing from several existing MLP-based models that primarily emphasize axes, our Spiral FC layer is designed as a deformable convolution layer with spiral-like offsets. We further adapt Spiral FC into two variants: Self-Spiral FC and Cross-Spiral FC, which enable both local and global feature integration seamlessly, eliminating the need for additional processing steps. To thoroughly investigate the effectiveness of the spiral-like offsets and validate our design, we conduct ablation studies and explore optimal configurations. In empirical tests, SpiralMLP reaches state-of-the-art performance, similar to Transformers, CNNs, and other MLPs, benchmarking on ImageNet-1k, COCO and ADE20K. SpiralMLP still maintains linear computational complexity O(HW) and is compatible with varying input image resolutions. Our study reveals that targeting the full receptive field is not essential for achieving high performance, instead, adopting a refined approach offers better results.
Related papers
- SCHEME: Scalable Channel Mixer for Vision Transformers [52.605868919281086]
Vision Transformers have achieved impressive performance in many vision tasks.
Much less research has been devoted to the channel mixer or feature mixing block (FFN or)
We show that the dense connections can be replaced with a diagonal block structure that supports larger expansion ratios.
arXiv Detail & Related papers (2023-12-01T08:22:34Z) - Dimension Mixer: Group Mixing of Input Dimensions for Efficient Function Approximation [11.072628804821083]
CNNs, Transformers, and Fourier-Mixers motivated us to look for similarities and differences between them.
We found that these architectures can be interpreted through the lens of a general concept of dimension mixing.
In this work, we study group-wise sparse, non-linear, multi-layered and learnable mixing schemes of inputs and find that they are complementary to many standard neural architectures.
arXiv Detail & Related papers (2023-11-30T17:30:45Z) - Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation [68.24659910441736]
Shifted-Pillars-Concatenation (SPC) module offers superior local modeling power and performance gains.
We build a pure-MLP architecture called Caterpillar by replacing the convolutional layer with the SPC module in a hybrid model of sMLPNet.
Experiments show Caterpillar's excellent performance on both small-scale and ImageNet-1k classification benchmarks.
arXiv Detail & Related papers (2023-05-28T06:19:36Z) - AdaSfM: From Coarse Global to Fine Incremental Adaptive Structure from
Motion [48.835456049755166]
AdaSfM is a coarse-to-fine adaptive SfM approach that is scalable to large-scale and challenging datasets.
Our approach first does a coarse global SfM which improves the reliability of the view graph by leveraging measurements from low-cost sensors.
Our approach uses a threshold-adaptive strategy to align all local reconstructions to the coordinate frame of global SfM.
arXiv Detail & Related papers (2023-01-28T09:06:50Z) - BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons [37.28828605119602]
This paper studies the problem of designing compact binary architectures for vision multi-layer perceptrons (MLPs)
We find that previous binarization methods perform poorly due to limited capacity of binary samplings.
We propose to improve the performance of binary mixing and channel mixing (BiMLP) model by enriching the representation ability of binary FC layers.
arXiv Detail & Related papers (2022-12-29T02:43:41Z) - RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [113.1414517605892]
We propose a methodology, Locality Injection, to incorporate local priors into an FC layer.
RepMLPNet is the first that seamlessly transfer to Cityscapes semantic segmentation.
arXiv Detail & Related papers (2021-12-21T10:28:17Z) - AS-MLP: An Axial Shifted MLP Architecture for Vision [50.11765148947432]
An Axial Shifted architecture (AS-MLP) is proposed in this paper.
By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different directions.
With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset.
arXiv Detail & Related papers (2021-07-18T08:56:34Z) - Pushing the Envelope of Rotation Averaging for Visual SLAM [69.7375052440794]
We propose a novel optimization backbone for visual SLAM systems.
We leverage averaging to improve the accuracy, efficiency and robustness of conventional monocular SLAM systems.
Our approach can exhibit up to 10x faster with comparable accuracy against the state-art on public benchmarks.
arXiv Detail & Related papers (2020-11-02T18:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.