Related papers: BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons

BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons

URL: http://arxiv.org/abs/2212.14158v1
Date: Thu, 29 Dec 2022 02:43:41 GMT
Title: BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons
Authors: Yixing Xu, Xinghao Chen, Yunhe Wang
Abstract summary: This paper studies the problem of designing compact binary architectures for vision multi-layer perceptrons (MLPs) We find that previous binarization methods perform poorly due to limited capacity of binary samplings. We propose to improve the performance of binary mixing and channel mixing (BiMLP) model by enriching the representation ability of binary FC layers.
Score: 37.28828605119602
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the problem of designing compact binary architectures for vision multi-layer perceptrons (MLPs). We provide extensive analysis on the difficulty of binarizing vision MLPs and find that previous binarization methods perform poorly due to limited capacity of binary MLPs. In contrast with the traditional CNNs that utilizing convolutional operations with large kernel size, fully-connected (FC) layers in MLPs can be treated as convolutional layers with kernel size $1\times1$. Thus, the representation ability of the FC layers will be limited when being binarized, and places restrictions on the capability of spatial mixing and channel mixing on the intermediate features. To this end, we propose to improve the performance of binary MLP (BiMLP) model by enriching the representation ability of binary FC layers. We design a novel binary block that contains multiple branches to merge a series of outputs from the same stage, and also a universal shortcut connection that encourages the information flow from the previous stage. The downsampling layers are also carefully designed to reduce the computational complexity while maintaining the classification performance. Experimental results on benchmark dataset ImageNet-1k demonstrate the effectiveness of the proposed BiMLP models, which achieve state-of-the-art accuracy compared to prior binary CNNs. The MindSpore code is available at \url{https://gitee.com/mindspore/models/tree/master/research/cv/BiMLP}.

Related papers

SCHEME: Scalable Channel Mixer for Vision Transformers [52.605868919281086]
Vision Transformers have achieved impressive performance in many vision tasks. Much less research has been devoted to the channel mixer or feature mixing block (FFN or) We show that the dense connections can be replaced with a diagonal block structure that supports larger expansion ratios.
arXiv Detail & Related papers (2023-12-01T08:22:34Z)
Strip-MLP: Efficient Token Interaction for Vision MLP [31.02197585697145]
We introduce textbfStrip-MLP to enrich the token interaction power in three ways. Strip-MLP significantly improves the performance of spatial-based models on small datasets. Models achieve higher average Top-1 accuracy than existing datasets by +2.44% on Caltech-101 and +2.16% on CIFAR-100.
arXiv Detail & Related papers (2023-07-21T09:40:42Z)
Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation [68.24659910441736]
Shifted-Pillars-Concatenation (SPC) module offers superior local modeling power and performance gains. We build a pure-MLP architecture called Caterpillar by replacing the convolutional layer with the SPC module in a hybrid model of sMLPNet. Experiments show Caterpillar's excellent performance on both small-scale and ImageNet-1k classification benchmarks.
arXiv Detail & Related papers (2023-05-28T06:19:36Z)
A new perspective on probabilistic image modeling [92.89846887298852]
We present a new probabilistic approach for image modeling capable of density estimation, sampling and tractable inference. DCGMMs can be trained end-to-end by SGD from random initial conditions, much like CNNs. We show that DCGMMs compare favorably to several recent PC and SPN models in terms of inference, classification and sampling.
arXiv Detail & Related papers (2022-03-21T14:53:57Z)
UNeXt: MLP-based Rapid Medical Image Segmentation Network [80.16644725886968]
UNet and its latest extensions like TransUNet have been the leading medical image segmentation methods in recent years. We propose UNeXt which is a Convolutional multilayer perceptron based network for image segmentation. We show that we reduce the number of parameters by 72x, decrease the computational complexity by 68x, and improve the inference speed by 10x while also obtaining better segmentation performance.
arXiv Detail & Related papers (2022-03-09T18:58:22Z)
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [113.1414517605892]
We propose a methodology, Locality Injection, to incorporate local priors into an FC layer. RepMLPNet is the first that seamlessly transfer to Cityscapes semantic segmentation.
arXiv Detail & Related papers (2021-12-21T10:28:17Z)
ConvMLP: Hierarchical Convolutional MLPs for Vision [7.874749885641495]
We propose a hierarchical ConMLP: a light-weight, stage-wise, co-design for visual recognition. We show that ConvMLP can be seamlessly transferred and achieve competitive results with fewer parameters.
arXiv Detail & Related papers (2021-09-09T17:52:57Z)
Sparse-MLP: A Fully-MLP Architecture with Conditional Computation [7.901786481399378]
Mixture-of-Experts (MoE) with sparse conditional computation has been proved an effective architecture for scaling attention-based models to more parameters with comparable computation cost. We propose Sparse-MLP, scaling the recent-Mixer model with MoE, to achieve a more-efficient architecture.
arXiv Detail & Related papers (2021-09-05T06:43:08Z)
CycleMLP: A MLP-like Architecture for Dense Prediction [26.74203747156439]
CycleMLP is a versatile backbone for visual recognition and dense predictions. It can cope with various image sizes and achieves linear computational complexity to image size by using local windows. CycleMLP aims to provide a competitive baseline on object detection, instance segmentation, and semantic segmentation for models.
arXiv Detail & Related papers (2021-07-21T17:23:06Z)
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition [123.59890802196797]
We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition. We construct convolutional layers inside a RepMLP during training and merge them into the FC for inference. By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs.
arXiv Detail & Related papers (2021-05-05T06:17:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.