Rethinking Token-Mixing MLP for MLP-based Vision Backbone
- URL: http://arxiv.org/abs/2106.14882v1
- Date: Mon, 28 Jun 2021 17:59:57 GMT
- Title: Rethinking Token-Mixing MLP for MLP-based Vision Backbone
- Authors: Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, Ping Li
- Abstract summary: We propose an improved structure as termed Circulant Channel-Specific (CCS) token-mixing benchmark, which is spatial-invariant and channel-specific.
It takes fewer parameters but achieves higher classification accuracy on ImageNet1K.
- Score: 34.47616917228978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the past decade, we have witnessed rapid progress in the machine vision
backbone. By introducing the inductive bias from the image processing,
convolution neural network (CNN) has achieved excellent performance in numerous
computer vision tasks and has been established as \emph{de facto} backbone. In
recent years, inspired by the great success achieved by Transformer in NLP
tasks, vision Transformer models emerge. Using much less inductive bias, they
have achieved promising performance in computer vision tasks compared with
their CNN counterparts. More recently, researchers investigate using the
pure-MLP architecture to build the vision backbone to further reduce the
inductive bias, achieving good performance. The pure-MLP backbone is built upon
channel-mixing MLPs to fuse the channels and token-mixing MLPs for
communications between patches. In this paper, we re-think the design of the
token-mixing MLP. We discover that token-mixing MLPs in existing MLP-based
backbones are spatial-specific, and thus it is sensitive to spatial
translation. Meanwhile, the channel-agnostic property of the existing
token-mixing MLPs limits their capability in mixing tokens. To overcome those
limitations, we propose an improved structure termed as Circulant
Channel-Specific (CCS) token-mixing MLP, which is spatial-invariant and
channel-specific. It takes fewer parameters but achieves higher classification
accuracy on ImageNet1K benchmark.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.