Related papers: UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

UniNet: Unified Architecture Search with Convolution, Transformer, and MLP

URL: http://arxiv.org/abs/2207.05420v1
Date: Tue, 12 Jul 2022 09:30:58 GMT
Title: UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
Authors: Jihao Liu and Xin Huang and Guanglu Song and Yu Liu and Hongsheng Li
Abstract summary: We propose a novel unified architecture search approach for high-performance networks. First, we model the very different searchable operators in a unified form. Second, we propose context-aware downsampling modules (DSMs) to mitigate the gap between the different types of operators. Third, we integrate operators and DSMs into a unified search space and search with a Reinforcement Learning-based search algorithm.
Score: 39.489331136395535
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, transformer and multi-layer perceptron (MLP) architectures have achieved impressive results on various vision tasks. However, how to effectively combine those operators to form high-performance hybrid visual architectures still remains a challenge. In this work, we study the learnable combination of convolution, transformer, and MLP by proposing a novel unified architecture search approach. Our approach contains two key designs to achieve the search for high-performance networks. First, we model the very different searchable operators in a unified form, and thus enable the operators to be characterized with the same set of configuration parameters. In this way, the overall search space size is significantly reduced, and the total search cost becomes affordable. Second, we propose context-aware downsampling modules (DSMs) to mitigate the gap between the different types of operators. Our proposed DSMs are able to better adapt features from different types of operators, which is important for identifying high-performance hybrid architectures. Finally, we integrate configurable operators and DSMs into a unified search space and search with a Reinforcement Learning-based search algorithm to fully explore the optimal combination of the operators. To this end, we search a baseline network and scale it up to obtain a family of models, named UniNets, which achieve much better accuracy and efficiency than previous ConvNets and Transformers. In particular, our UniNet-B5 achieves 84.9% top-1 accuracy on ImageNet, outperforming EfficientNet-B7 and BoTNet-T7 with 44% and 55% fewer FLOPs respectively. By pretraining on the ImageNet-21K, our UniNet-B6 achieves 87.4%, outperforming Swin-L with 51% fewer FLOPs and 41% fewer parameters. Code is available at https://github.com/Sense-X/UniNet.

Related papers

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding [49.218195440600354]
Current image pyramids use the same large-scale model to process multiple resolutions, leading to significant computational cost. We propose a novel network architecture, called COCO-Inverted Image Pyramid Networks (PIIP) PIIP uses pretrained models (ViTs or CNNs) as branches to process multi-scale images, where images of higher resolutions are processed by smaller network branches to balance computational cost and performance.
arXiv Detail & Related papers (2025-01-14T01:57:41Z)
Differentiable Model Scaling using Differentiable Topk [12.084701778797854]
This study introduces Differentiable Model Scaling (DMS), increasing the efficiency for searching optimal width and depth in networks. Results consistently indicate that our DMS can find improved structures and outperforms state-of-the-art NAS methods.
arXiv Detail & Related papers (2024-05-12T07:34:33Z)
SimQ-NAS: Simultaneous Quantization Policy and Neural Architecture Search [6.121126813817338]
Recent one-shot Neural Architecture Search algorithms rely on training a hardware-agnostic super-network tailored to a specific task and then extracting efficient sub-networks for different hardware platforms. We show that by using multi-objective search algorithms paired with lightly trained predictors, we can efficiently search for both the sub-network architecture and the corresponding quantization policy.
arXiv Detail & Related papers (2023-12-19T22:08:49Z)
Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference [13.924924047051782]
Deep convolution architectures for Spiking Neural Networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens. This research explores a new pathway, drawing inspiration from the progress made in Multi-Layer Perceptrons (MLPs) We propose an innovative spiking architecture that uses batch normalization to retain MFI compatibility. We establish an efficient multi-stage spiking network that blends effectively global receptive fields with local feature extraction.
arXiv Detail & Related papers (2023-06-21T16:52:20Z)
Systematic Architectural Design of Scale Transformed Attention Condenser DNNs via Multi-Scale Class Representational Response Similarity Analysis [93.0013343535411]
We propose a novel type of analysis called Multi-Scale Class Representational Response Similarity Analysis (ClassRepSim) We show that adding STAC modules to ResNet style architectures can result in up to a 1.6% increase in top-1 accuracy. Results from ClassRepSim analysis can be used to select an effective parameterization of the STAC module resulting in competitive performance.
arXiv Detail & Related papers (2023-06-16T18:29:26Z)
Pruning Self-attentions into Convolutional Layers in Single Path [89.55361659622305]
Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks. We propose Single-Path Vision Transformer pruning (SPViT) to efficiently and automatically compress the pre-trained ViTs. Our SPViT can trim 52.0% FLOPs for DeiT-B and get an impressive 0.6% top-1 accuracy gain simultaneously.
arXiv Detail & Related papers (2021-11-23T11:35:54Z)
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP [62.401161377258234]
In this paper, we propose to jointly search the optimal combination of convolution, transformer, and COCO for building a series of all-operator network architectures. We identify that the widely-used strided convolution or pooling based down-sampling modules become the performance bottlenecks when operators are combined to form a network. To better tackle the global context captured by the transformer and operators, we propose two novel context-aware down-sampling modules.
arXiv Detail & Related papers (2021-10-08T11:09:40Z)
DAAS: Differentiable Architecture and Augmentation Policy Search [107.53318939844422]
This work considers the possible coupling between neural architectures and data augmentation and proposes an effective algorithm jointly searching for them. Our approach achieves 97.91% accuracy on CIFAR-10 and 76.6% Top-1 accuracy on ImageNet dataset, showing the outstanding performance of our search algorithm.
arXiv Detail & Related papers (2021-09-30T17:15:17Z)
One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking [97.60915598958968]
We propose a one-shot neural ensemble architecture search (NEAS) solution that addresses the two challenges. For the first challenge, we introduce a novel diversity-based metric to guide search space shrinking. For the second challenge, we enable a new search dimension to learn layer sharing among different models for efficiency purposes.
arXiv Detail & Related papers (2021-04-01T16:29:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.