X-MLP: A Patch Embedding-Free MLP Architecture for Vision
- URL: http://arxiv.org/abs/2307.00592v1
- Date: Sun, 2 Jul 2023 15:20:25 GMT
- Title: X-MLP: A Patch Embedding-Free MLP Architecture for Vision
- Authors: Xinyue Wang, Zhicheng Cai and Chenglei Peng
- Abstract summary: Multi-layer perceptron (MLP) architectures for vision have been popular again.
We propose X-MLP, an architecture constructed absolutely upon fully connected layers and free from patch embedding.
X-MLP is tested on ten benchmark datasets, all better performance than other vision models.
- Score: 4.493200639605705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks (CNNs) and vision transformers (ViT) have
obtained great achievements in computer vision. Recently, the research of
multi-layer perceptron (MLP) architectures for vision have been popular again.
Vision MLPs are designed to be independent from convolutions and self-attention
operations. However, existing vision MLP architectures always depend on
convolution for patch embedding. Thus we propose X-MLP, an architecture
constructed absolutely upon fully connected layers and free from patch
embedding. It decouples the features extremely and utilizes MLPs to interact
the information across the dimension of width, height and channel independently
and alternately. X-MLP is tested on ten benchmark datasets, all obtaining
better performance than other vision MLP models. It even surpasses CNNs by a
clear margin on various dataset. Furthermore, through mathematically restoring
the spatial weights, we visualize the information communication between any
couples of pixels in the feature map and observe the phenomenon of capturing
long-range dependency.
Related papers
- R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition [33.53114929452528]
Vision architectures based exclusively on multi-layer perceptrons (MLPs) have gained much attention in the computer vision community.
We present an achieves a view-based 3D object recognition task by considering the communications between patches from different views.
With a conceptually simple structure, our R$2$MLP achieves competitive performance compared with existing methods.
arXiv Detail & Related papers (2022-11-20T21:13:02Z) - GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation [68.65764751482774]
GraphMLP is a global-local-graphical unified architecture for 3D human pose estimation.
It incorporates the graph structure of human bodies into a model to meet the domain-specific demand of the 3D human pose.
It can be extended to model complex temporal dynamics in a simple way with negligible computational cost gains in the sequence length.
arXiv Detail & Related papers (2022-06-13T18:59:31Z) - MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing [123.43419144051703]
We present a novel-like 3D architecture for video recognition.
The results are comparable to state-of-the-art widely-used 3D CNNs and video.
arXiv Detail & Related papers (2022-06-13T16:21:33Z) - MDMLP: Image Classification from Scratch on Small Datasets with MLP [7.672827879118106]
Recently, the attention mechanism has become a go-to technique for natural language processing and computer vision tasks.
Recently, theMixer and other-based architectures, based simply on multi-layer perceptrons (MLPs), are also powerful compared to CNNs and attention techniques.
arXiv Detail & Related papers (2022-05-28T16:26:59Z) - An Image Patch is a Wave: Phase-Aware Vision MLP [54.104040163690364]
multilayer perceptron (MLP) is a new kind of vision model with extremely simple architecture that only stacked by fully-connected layers.
We propose to represent each token as a wave function with two parts, amplitude and phase.
Experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art architectures on various vision tasks.
arXiv Detail & Related papers (2021-11-24T06:25:49Z) - Are we ready for a new paradigm shift? A Survey on Visual Deep MLP [33.00328314841369]
Multilayer perceptron (MLP), as the first neural network structure to appear, was a big hit.
constrained by the hardware computing power and the size of the datasets, it once sank for tens of years.
We have witnessed a paradigm shift from manual feature extraction to the CNN with local receptive fields, and further to the Transform with global receptive fields.
arXiv Detail & Related papers (2021-11-07T12:02:00Z) - Hire-MLP: Vision MLP via Hierarchical Rearrangement [58.33383667626998]
Hire-MLP is a simple yet competitive vision architecture via rearrangement.
The proposed Hire-MLP architecture is built with simple channel-mixing operations, thus enjoys high flexibility and inference speed.
Experiments show that our Hire-MLP achieves state-of-the-art performance on the ImageNet-1K benchmark.
arXiv Detail & Related papers (2021-08-30T16:11:04Z) - AS-MLP: An Axial Shifted MLP Architecture for Vision [50.11765148947432]
An Axial Shifted architecture (AS-MLP) is proposed in this paper.
By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different directions.
With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset.
arXiv Detail & Related papers (2021-07-18T08:56:34Z) - MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs).
Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.