Hire-MLP: Vision MLP via Hierarchical Rearrangement
- URL: http://arxiv.org/abs/2108.13341v1
- Date: Mon, 30 Aug 2021 16:11:04 GMT
- Title: Hire-MLP: Vision MLP via Hierarchical Rearrangement
- Authors: Jianyuan Guo, Yehui Tang, Kai Han, Xinghao Chen, Han Wu, Chao Xu,
Chang Xu and Yunhe Wang
- Abstract summary: Hire-MLP is a simple yet competitive vision architecture via rearrangement.
The proposed Hire-MLP architecture is built with simple channel-mixing operations, thus enjoys high flexibility and inference speed.
Experiments show that our Hire-MLP achieves state-of-the-art performance on the ImageNet-1K benchmark.
- Score: 58.33383667626998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents Hire-MLP, a simple yet competitive vision MLP
architecture via hierarchical rearrangement. Previous vision MLPs like
MLP-Mixer are not flexible for various image sizes and are inefficient to
capture spatial information by flattening the tokens. Hire-MLP innovates the
existing MLP-based models by proposing the idea of hierarchical rearrangement
to aggregate the local and global spatial information while being versatile for
downstream tasks. Specifically, the inner-region rearrangement is designed to
capture local information inside a spatial region. Moreover, to enable
information communication between different regions and capture global context,
the cross-region rearrangement is proposed to circularly shift all tokens along
spatial directions. The proposed Hire-MLP architecture is built with simple
channel-mixing MLPs and rearrangement operations, thus enjoys high flexibility
and inference speed. Experiments show that our Hire-MLP achieves
state-of-the-art performance on the ImageNet-1K benchmark. In particular,
Hire-MLP achieves an 83.4\% top-1 accuracy on ImageNet, which surpasses
previous Transformer-based and MLP-based models with better trade-off for
accuracy and throughput.
Related papers
- R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition [33.53114929452528]
Vision architectures based exclusively on multi-layer perceptrons (MLPs) have gained much attention in the computer vision community.
We present an achieves a view-based 3D object recognition task by considering the communications between patches from different views.
With a conceptually simple structure, our R$2$MLP achieves competitive performance compared with existing methods.
arXiv Detail & Related papers (2022-11-20T21:13:02Z) - ReLU Fields: The Little Non-linearity That Could [62.228229880658404]
We investigate what is the smallest change to grid-based representations that allows for retaining the high fidelity result ofs.
We show that such an approach becomes competitive with the state-of-the-art.
arXiv Detail & Related papers (2022-05-22T13:42:31Z) - Mixing and Shifting: Exploiting Global and Local Dependencies in Vision
MLPs [84.3235981545673]
Token-mixing multi-layer perceptron (MLP) models have shown competitive performance in computer vision tasks.
We present Mix-Shift-MLP which makes the size of the local receptive field used for mixing increase with respect to the amount of spatial shifting.
MS-MLP achieves competitive performance in multiple vision benchmarks.
arXiv Detail & Related papers (2022-02-14T06:53:48Z) - RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [113.1414517605892]
We propose a methodology, Locality Injection, to incorporate local priors into an FC layer.
RepMLPNet is the first that seamlessly transfer to Cityscapes semantic segmentation.
arXiv Detail & Related papers (2021-12-21T10:28:17Z) - CycleMLP: A MLP-like Architecture for Dense Prediction [26.74203747156439]
CycleMLP is a versatile backbone for visual recognition and dense predictions.
It can cope with various image sizes and achieves linear computational complexity to image size by using local windows.
CycleMLP aims to provide a competitive baseline on object detection, instance segmentation, and semantic segmentation for models.
arXiv Detail & Related papers (2021-07-21T17:23:06Z) - AS-MLP: An Axial Shifted MLP Architecture for Vision [50.11765148947432]
An Axial Shifted architecture (AS-MLP) is proposed in this paper.
By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different directions.
With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset.
arXiv Detail & Related papers (2021-07-18T08:56:34Z) - S$^2$-MLP: Spatial-Shift MLP Architecture for Vision [34.47616917228978]
Recently, visual Transformer (ViT) and its following works abandon the convolution and exploit the self-attention operation.
In this paper, we propose a novel pure architecture, spatial-shift (S$2$-MLP)
arXiv Detail & Related papers (2021-06-14T15:05:11Z) - MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs).
Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.