DynaMixer: A Vision MLP Architecture with Dynamic Mixing
- URL: http://arxiv.org/abs/2201.12083v1
- Date: Fri, 28 Jan 2022 12:43:14 GMT
- Title: DynaMixer: A Vision MLP Architecture with Dynamic Mixing
- Authors: Ziyu Wang and Wenhao Jiang and Yiming Zhu and Li Yuan and Yibing Song
and Wei Liu
- Abstract summary: This paper presents an efficient tasks-like network architecture, dubbed DynaMixer, resorting to dynamic information fusion.
We propose a procedure, on which the DynaMixer model relies, to dynamically generate mixing by leveraging the contents of all the tokens to be mixed.
Our proposed DynaMixer model (97M parameters) achieves 84.3% top-1 accuracy on the ImageNet-1K, performing favorably against the state-of-the-art vision models.
- Score: 38.23027495545522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, MLP-like vision models have achieved promising performances on
mainstream visual recognition tasks. In contrast with vision transformers and
CNNs, the success of MLP-like models shows that simple information fusion
operations among tokens and channels can yield a good representation power for
deep recognition models. However, existing MLP-like models fuse tokens through
static fusion operations, lacking adaptability to the contents of the tokens to
be mixed. Thus, customary information fusion procedures are not effective
enough. To this end, this paper presents an efficient MLP-like network
architecture, dubbed DynaMixer, resorting to dynamic information fusion.
Critically, we propose a procedure, on which the DynaMixer model relies, to
dynamically generate mixing matrices by leveraging the contents of all the
tokens to be mixed. To reduce the time complexity and improve the robustness, a
dimensionality reduction technique and a multi-segment fusion mechanism are
adopted. Our proposed DynaMixer model (97M parameters) achieves 84.3\% top-1
accuracy on the ImageNet-1K dataset without extra training data, performing
favorably against the state-of-the-art vision MLP models. When the number of
parameters is reduced to 26M, it still achieves 82.7\% top-1 accuracy,
surpassing the existing MLP-like models with a similar capacity. The
implementation of DynaMixer will be made available to the public.
Related papers
- Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking [6.9366619419210656]
Transformers have established themselves as the leading neural network model in natural language processing.
Recent research has explored replacing attention modules with other mechanisms, including those described by MetaFormers.
This paper integrates Krotov's hierarchical associative memory with MetaFormers, enabling a comprehensive representation of the Transformer block.
arXiv Detail & Related papers (2024-06-18T02:42:19Z) - SCHEME: Scalable Channel Mixer for Vision Transformers [52.605868919281086]
Vision Transformers have achieved impressive performance in many vision tasks.
Much less research has been devoted to the channel mixer or feature mixing block (FFN or)
We show that the dense connections can be replaced with a diagonal block structure that supports larger expansion ratios.
arXiv Detail & Related papers (2023-12-01T08:22:34Z) - CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing [2.1016271540149636]
We propose a hierarchical Vision that learns dynamic low-rank transformations for spatial-channel mixing through cross-scale local and global aggregation.
Our largest model, CS-Mixer-L, reaches 83.2% top-1 accuracy on ImageNet-1k with 13.7 GFLOPs and 94 M parameters.
arXiv Detail & Related papers (2023-08-25T13:18:14Z) - TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series
Forecasting [13.410217680999459]
Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions.
High memory and computing requirements pose a critical bottleneck for long-term forecasting.
We propose TSMixer, a lightweight neural architecture composed of multi-layer perceptron (MLP) modules.
arXiv Detail & Related papers (2023-06-14T06:26:23Z) - SplitMixer: Fat Trimmed From MLP-like Models [53.12472550578278]
We present SplitMixer, a simple and lightweight isotropic-like architecture, for visual recognition.
It contains two types of interleaving convolutional operations to mix information across locations (spatial mixing) and channels (channel mixing)
arXiv Detail & Related papers (2022-07-21T01:37:07Z) - Parameterization of Cross-Token Relations with Relative Positional
Encoding for Vision MLP [52.25478388220691]
Vision multi-layer perceptrons (MLPs) have shown promising performance in computer vision tasks.
They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.
We propose a new positional spacial gating unit (PoSGU) to efficiently encode the cross-token relations for token mixing.
arXiv Detail & Related papers (2022-07-15T04:18:06Z) - pNLP-Mixer: an Efficient all-MLP Architecture for Language [10.634940525287014]
pNLP-Mixer model for on-device NLP achieves high weight-efficiency thanks to a novel projection layer.
We evaluate a pNLP-Mixer model of only one megabyte in size on two multi-lingual semantic parsing datasets, MTOP and multiATIS.
Our model consistently beats the state-of-the-art of tiny models, which is twice as large, by a margin up to 7.8% on MTOP.
arXiv Detail & Related papers (2022-02-09T09:01:29Z) - Sparse MLP for Image Recognition: Is Self-Attention Really Necessary? [65.37917850059017]
We build an attention-free network called sMLPNet.
For 2D image tokens, sMLP applies 1D along the axial directions and the parameters are shared among rows or columns.
When scaling up to 66M parameters, sMLPNet achieves 83.4% top-1 accuracy, which is on par with the state-of-the-art Swin Transformer.
arXiv Detail & Related papers (2021-09-12T04:05:15Z) - MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs).
Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.