ActiveMLP: An MLP-like Architecture with Active Token Mixer
- URL: http://arxiv.org/abs/2203.06108v1
- Date: Fri, 11 Mar 2022 17:29:54 GMT
- Title: ActiveMLP: An MLP-like Architecture with Active Token Mixer
- Authors: Guoqiang Wei, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen
- Abstract summary: This paper presents ActiveMLP, a general-like backbone for computer vision.
We propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate contextual information from other tokens in the global scope into the given one.
In this way, the spatial range of token-mixing is expanded and the way of token-mixing is reformed.
- Score: 54.95923719553343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents ActiveMLP, a general MLP-like backbone for computer
vision. The three existing dominant network families, i.e., CNNs, Transformers
and MLPs, differ from each other mainly in the ways to fuse contextual
information into a given token, leaving the design of more effective
token-mixing mechanisms at the core of backbone architecture development. In
ActiveMLP, we propose an innovative token-mixer, dubbed Active Token Mixer
(ATM), to actively incorporate contextual information from other tokens in the
global scope into the given one. This fundamental operator actively predicts
where to capture useful contexts and learns how to fuse the captured contexts
with the original information of the given token at channel levels. In this
way, the spatial range of token-mixing is expanded and the way of token-mixing
is reformed. With this design, ActiveMLP is endowed with the merits of global
receptive fields and more flexible content-adaptive information fusion.
Extensive experiments demonstrate that ActiveMLP is generally applicable and
comprehensively surpasses different families of SOTA vision backbones by a
clear margin on a broad range of vision tasks, including visual recognition and
dense prediction tasks. The code and models will be available at
https://github.com/microsoft/ActiveMLP.
Related papers
- QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using
MLPMixer [10.503972720941693]
Current keyword spotting systems are typically trained with a large amount of pre-defined keywords.
We propose a pure-vocabulary neural network that is based on theMixer model architecture.
Our proposed model has a smaller number of parameters and MACs compared to the baseline models.
arXiv Detail & Related papers (2022-06-23T18:18:44Z) - MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing [123.43419144051703]
We present a novel-like 3D architecture for video recognition.
The results are comparable to state-of-the-art widely-used 3D CNNs and video.
arXiv Detail & Related papers (2022-06-13T16:21:33Z) - An Image Patch is a Wave: Phase-Aware Vision MLP [54.104040163690364]
multilayer perceptron (MLP) is a new kind of vision model with extremely simple architecture that only stacked by fully-connected layers.
We propose to represent each token as a wave function with two parts, amplitude and phase.
Experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art architectures on various vision tasks.
arXiv Detail & Related papers (2021-11-24T06:25:49Z) - Hire-MLP: Vision MLP via Hierarchical Rearrangement [58.33383667626998]
Hire-MLP is a simple yet competitive vision architecture via rearrangement.
The proposed Hire-MLP architecture is built with simple channel-mixing operations, thus enjoys high flexibility and inference speed.
Experiments show that our Hire-MLP achieves state-of-the-art performance on the ImageNet-1K benchmark.
arXiv Detail & Related papers (2021-08-30T16:11:04Z) - AS-MLP: An Axial Shifted MLP Architecture for Vision [50.11765148947432]
An Axial Shifted architecture (AS-MLP) is proposed in this paper.
By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different directions.
With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset.
arXiv Detail & Related papers (2021-07-18T08:56:34Z) - Rethinking Token-Mixing MLP for MLP-based Vision Backbone [34.47616917228978]
We propose an improved structure as termed Circulant Channel-Specific (CCS) token-mixing benchmark, which is spatial-invariant and channel-specific.
It takes fewer parameters but achieves higher classification accuracy on ImageNet1K.
arXiv Detail & Related papers (2021-06-28T17:59:57Z) - MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs).
Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.