UniNet: Unified Architecture Search with Convolution, Transformer, and
MLP
- URL: http://arxiv.org/abs/2110.04035v1
- Date: Fri, 8 Oct 2021 11:09:40 GMT
- Title: UniNet: Unified Architecture Search with Convolution, Transformer, and
MLP
- Authors: Jihao Liu and Hongsheng Li and Guanglu Song and Xin Huang and Yu Liu
- Abstract summary: In this paper, we propose to jointly search the optimal combination of convolution, transformer, and COCO for building a series of all-operator network architectures.
We identify that the widely-used strided convolution or pooling based down-sampling modules become the performance bottlenecks when operators are combined to form a network.
To better tackle the global context captured by the transformer and operators, we propose two novel context-aware down-sampling modules.
- Score: 62.401161377258234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, transformer and multi-layer perceptron (MLP) architectures have
achieved impressive results on various vision tasks. A few works investigated
manually combining those operators to design visual network architectures, and
can achieve satisfactory performances to some extent. In this paper, we propose
to jointly search the optimal combination of convolution, transformer, and MLP
for building a series of all-operator network architectures with high
performances on visual tasks. We empirically identify that the widely-used
strided convolution or pooling based down-sampling modules become the
performance bottlenecks when the operators are combined to form a network. To
better tackle the global context captured by the transformer and MLP operators,
we propose two novel context-aware down-sampling modules, which can better
adapt to the global information encoded by transformer and MLP operators. To
this end, we jointly search all operators and down-sampling modules in a
unified search space. Notably, Our searched network UniNet (Unified Network)
outperforms state-of-the-art pure convolution-based architecture, EfficientNet,
and pure transformer-based architecture, Swin-Transformer, on multiple public
visual benchmarks, ImageNet classification, COCO object detection, and ADE20K
semantic segmentation.
Related papers
- CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection [1.837431956557716]
Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection.
We propose a novel decoder block that integrates feature pyramids and transformers.
Our model achieves superior performance in detecting small objects compared to existing methods.
arXiv Detail & Related papers (2024-04-23T18:46:07Z) - Transformer-based Context Condensation for Boosting Feature Pyramids in
Object Detection [77.50110439560152]
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF)
We propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results.
In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency.
arXiv Detail & Related papers (2022-07-14T01:45:03Z) - UniNet: Unified Architecture Search with Convolution, Transformer, and
MLP [39.489331136395535]
We propose a novel unified architecture search approach for high-performance networks.
First, we model the very different searchable operators in a unified form.
Second, we propose context-aware downsampling modules (DSMs) to mitigate the gap between the different types of operators.
Third, we integrate operators and DSMs into a unified search space and search with a Reinforcement Learning-based search algorithm.
arXiv Detail & Related papers (2022-07-12T09:30:58Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - A Simple Single-Scale Vision Transformer for Object Localization and
Instance Segmentation [79.265315267391]
We propose a simple and compact ViT architecture called Universal Vision Transformer (UViT)
UViT achieves strong performance on object detection and instance segmentation tasks.
arXiv Detail & Related papers (2021-12-17T20:11:56Z) - A Survey of Visual Transformers [30.082304742571598]
Transformer, an attention-based encoder-decoder architecture, has revolutionized the field of natural language processing.
Some pioneering works have recently been done on adapting Transformer architectures to Computer Vision (CV) fields.
We have provided a comprehensive review of over one hundred different visual Transformers for three fundamental CV tasks.
arXiv Detail & Related papers (2021-11-11T07:56:04Z) - Conformer: Local Features Coupling Global Representations for Visual
Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.