GLiT: Neural Architecture Search for Global and Local Image Transformer
- URL: http://arxiv.org/abs/2107.02960v1
- Date: Wed, 7 Jul 2021 00:48:09 GMT
- Title: GLiT: Neural Architecture Search for Global and Local Image Transformer
- Authors: Boyu Chen, Peixia Li, Chuming Li, Baopu Li, Lei Bai, Chen Lin, Ming
Sun, Junjie yan, Wanli Ouyang
- Abstract summary: We introduce the first Neural Architecture Search (NAS) method to find a better transformer architecture for image recognition.
Our method can find more discriminative and efficient transformer variants than the ResNet family and the baseline ViT for image classification.
- Score: 114.8051035856023
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce the first Neural Architecture Search (NAS) method to find a
better transformer architecture for image recognition. Recently, transformers
without CNN-based backbones are found to achieve impressive performance for
image recognition. However, the transformer is designed for NLP tasks and thus
could be sub-optimal when directly used for image recognition. In order to
improve the visual representation ability for transformers, we propose a new
search space and searching algorithm. Specifically, we introduce a locality
module that models the local correlations in images explicitly with fewer
computational cost. With the locality module, our search space is defined to
let the search algorithm freely trade off between global and local information
as well as optimizing the low-level design choice in each module. To tackle the
problem caused by huge search space, a hierarchical neural architecture search
method is proposed to search the optimal vision transformer from two levels
separately with the evolutionary algorithm. Extensive experiments on the
ImageNet dataset demonstrate that our method can find more discriminative and
efficient transformer variants than the ResNet family (e.g., ResNet101) and the
baseline ViT for image classification.
Related papers
- SRTransGAN: Image Super-Resolution using Transformer based Generative
Adversarial Network [16.243363392717434]
We propose a transformer-based encoder-decoder network as a generator to generate 2x images and 4x images.
The proposed SRTransGAN outperforms the existing methods by 4.38 % on an average of PSNR and SSIM scores.
arXiv Detail & Related papers (2023-12-04T16:22:39Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Evolutionary Neural Architecture Search for Transformer in Knowledge
Tracing [8.779571123401185]
This paper proposes an evolutionary neural architecture search approach to automate the input feature selection and automatically determine where to apply which operation for achieving the balancing of the local/global context modelling.
Experimental results on the two largest and most challenging education datasets demonstrate the effectiveness of the architecture found by the proposed approach.
arXiv Detail & Related papers (2023-10-02T13:19:33Z) - Transformer-Guided Convolutional Neural Network for Cross-View
Geolocalization [20.435023745201878]
We propose a novel Transformer-guided convolutional neural network (TransGCNN) architecture.
Our TransGCNN consists of a CNN backbone extracting feature map from an input image and a Transformer head modeling global context.
Experiments on popular benchmark datasets demonstrate that our model achieves top-1 accuracy of 94.12% and 84.92% on CVUSA and CVACT_val, respectively.
arXiv Detail & Related papers (2022-04-21T08:46:41Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Searching the Search Space of Vision Transformer [98.96601221383209]
Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection.
We propose to use neural architecture search to automate this process, by searching not only the architecture but also the search space.
We provide design guidelines of general vision transformers with extensive analysis according to the space searching process.
arXiv Detail & Related papers (2021-11-29T17:26:07Z) - UniNet: Unified Architecture Search with Convolution, Transformer, and
MLP [62.401161377258234]
In this paper, we propose to jointly search the optimal combination of convolution, transformer, and COCO for building a series of all-operator network architectures.
We identify that the widely-used strided convolution or pooling based down-sampling modules become the performance bottlenecks when operators are combined to form a network.
To better tackle the global context captured by the transformer and operators, we propose two novel context-aware down-sampling modules.
arXiv Detail & Related papers (2021-10-08T11:09:40Z) - NAS-DIP: Learning Deep Image Prior with Neural Architecture Search [65.79109790446257]
Recent work has shown that the structure of deep convolutional neural networks can be used as a structured image prior.
We propose to search for neural architectures that capture stronger image priors.
We search for an improved network by leveraging an existing neural architecture search algorithm.
arXiv Detail & Related papers (2020-08-26T17:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.