UniNet: Unified Architecture Search with Convolution, Transformer, and
MLP
- URL: http://arxiv.org/abs/2207.05420v1
- Date: Tue, 12 Jul 2022 09:30:58 GMT
- Title: UniNet: Unified Architecture Search with Convolution, Transformer, and
MLP
- Authors: Jihao Liu and Xin Huang and Guanglu Song and Yu Liu and Hongsheng Li
- Abstract summary: We propose a novel unified architecture search approach for high-performance networks.
First, we model the very different searchable operators in a unified form.
Second, we propose context-aware downsampling modules (DSMs) to mitigate the gap between the different types of operators.
Third, we integrate operators and DSMs into a unified search space and search with a Reinforcement Learning-based search algorithm.
- Score: 39.489331136395535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, transformer and multi-layer perceptron (MLP) architectures have
achieved impressive results on various vision tasks. However, how to
effectively combine those operators to form high-performance hybrid visual
architectures still remains a challenge. In this work, we study the learnable
combination of convolution, transformer, and MLP by proposing a novel unified
architecture search approach. Our approach contains two key designs to achieve
the search for high-performance networks. First, we model the very different
searchable operators in a unified form, and thus enable the operators to be
characterized with the same set of configuration parameters. In this way, the
overall search space size is significantly reduced, and the total search cost
becomes affordable. Second, we propose context-aware downsampling modules
(DSMs) to mitigate the gap between the different types of operators. Our
proposed DSMs are able to better adapt features from different types of
operators, which is important for identifying high-performance hybrid
architectures. Finally, we integrate configurable operators and DSMs into a
unified search space and search with a Reinforcement Learning-based search
algorithm to fully explore the optimal combination of the operators. To this
end, we search a baseline network and scale it up to obtain a family of models,
named UniNets, which achieve much better accuracy and efficiency than previous
ConvNets and Transformers. In particular, our UniNet-B5 achieves 84.9% top-1
accuracy on ImageNet, outperforming EfficientNet-B7 and BoTNet-T7 with 44% and
55% fewer FLOPs respectively. By pretraining on the ImageNet-21K, our UniNet-B6
achieves 87.4%, outperforming Swin-L with 51% fewer FLOPs and 41% fewer
parameters. Code is available at https://github.com/Sense-X/UniNet.
Related papers
- Differentiable Model Scaling using Differentiable Topk [12.084701778797854]
This study introduces Differentiable Model Scaling (DMS), increasing the efficiency for searching optimal width and depth in networks.
Results consistently indicate that our DMS can find improved structures and outperforms state-of-the-art NAS methods.
arXiv Detail & Related papers (2024-05-12T07:34:33Z) - SimQ-NAS: Simultaneous Quantization Policy and Neural Architecture
Search [6.121126813817338]
Recent one-shot Neural Architecture Search algorithms rely on training a hardware-agnostic super-network tailored to a specific task and then extracting efficient sub-networks for different hardware platforms.
We show that by using multi-objective search algorithms paired with lightly trained predictors, we can efficiently search for both the sub-network architecture and the corresponding quantization policy.
arXiv Detail & Related papers (2023-12-19T22:08:49Z) - Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference [13.924924047051782]
Deep convolution architectures for Spiking Neural Networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens.
This research explores a new pathway, drawing inspiration from the progress made in Multi-Layer Perceptrons (MLPs)
We propose an innovative spiking architecture that uses batch normalization to retain MFI compatibility.
We establish an efficient multi-stage spiking network that blends effectively global receptive fields with local feature extraction.
arXiv Detail & Related papers (2023-06-21T16:52:20Z) - Systematic Architectural Design of Scale Transformed Attention Condenser
DNNs via Multi-Scale Class Representational Response Similarity Analysis [93.0013343535411]
We propose a novel type of analysis called Multi-Scale Class Representational Response Similarity Analysis (ClassRepSim)
We show that adding STAC modules to ResNet style architectures can result in up to a 1.6% increase in top-1 accuracy.
Results from ClassRepSim analysis can be used to select an effective parameterization of the STAC module resulting in competitive performance.
arXiv Detail & Related papers (2023-06-16T18:29:26Z) - Pruning Self-attentions into Convolutional Layers in Single Path [89.55361659622305]
Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks.
We propose Single-Path Vision Transformer pruning (SPViT) to efficiently and automatically compress the pre-trained ViTs.
Our SPViT can trim 52.0% FLOPs for DeiT-B and get an impressive 0.6% top-1 accuracy gain simultaneously.
arXiv Detail & Related papers (2021-11-23T11:35:54Z) - UniNet: Unified Architecture Search with Convolution, Transformer, and
MLP [62.401161377258234]
In this paper, we propose to jointly search the optimal combination of convolution, transformer, and COCO for building a series of all-operator network architectures.
We identify that the widely-used strided convolution or pooling based down-sampling modules become the performance bottlenecks when operators are combined to form a network.
To better tackle the global context captured by the transformer and operators, we propose two novel context-aware down-sampling modules.
arXiv Detail & Related papers (2021-10-08T11:09:40Z) - DAAS: Differentiable Architecture and Augmentation Policy Search [107.53318939844422]
This work considers the possible coupling between neural architectures and data augmentation and proposes an effective algorithm jointly searching for them.
Our approach achieves 97.91% accuracy on CIFAR-10 and 76.6% Top-1 accuracy on ImageNet dataset, showing the outstanding performance of our search algorithm.
arXiv Detail & Related papers (2021-09-30T17:15:17Z) - One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search
Space Shrinking [97.60915598958968]
We propose a one-shot neural ensemble architecture search (NEAS) solution that addresses the two challenges.
For the first challenge, we introduce a novel diversity-based metric to guide search space shrinking.
For the second challenge, we enable a new search dimension to learn layer sharing among different models for efficiency purposes.
arXiv Detail & Related papers (2021-04-01T16:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.