Searching for Efficient Multi-Stage Vision Transformers
- URL: http://arxiv.org/abs/2109.00642v1
- Date: Wed, 1 Sep 2021 22:37:56 GMT
- Title: Searching for Efficient Multi-Stage Vision Transformers
- Authors: Yi-Lun Liao and Sertac Karaman and Vivienne Sze
- Abstract summary: Vision Transformer (ViT) demonstrates that Transformer for natural language processing can be applied to computer vision tasks.
ViT-ResNAS is an efficient multi-stage ViT architecture designed with neural architecture search (NAS)
- Score: 42.0565109812926
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision Transformer (ViT) demonstrates that Transformer for natural language
processing can be applied to computer vision tasks and result in comparable
performance to convolutional neural networks (CNN), which have been studied and
adopted in computer vision for years. This naturally raises the question of how
the performance of ViT can be advanced with design techniques of CNN. To this
end, we propose to incorporate two techniques and present ViT-ResNAS, an
efficient multi-stage ViT architecture designed with neural architecture search
(NAS). First, we propose residual spatial reduction to decrease sequence
lengths for deeper layers and utilize a multi-stage architecture. When reducing
lengths, we add skip connections to improve performance and stabilize training
deeper networks. Second, we propose weight-sharing NAS with multi-architectural
sampling. We enlarge a network and utilize its sub-networks to define a search
space. A super-network covering all sub-networks is then trained for fast
evaluation of their performance. To efficiently train the super-network, we
propose to sample and train multiple sub-networks with one forward-backward
pass. After that, evolutionary search is performed to discover high-performance
network architectures. Experiments on ImageNet demonstrate that ViT-ResNAS
achieves better accuracy-MACs and accuracy-throughput trade-offs than the
original DeiT and other strong baselines of ViT. Code is available at
https://github.com/yilunliao/vit-search.
Related papers
- Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - Pruning-as-Search: Efficient Neural Architecture Search via Channel
Pruning and Structural Reparameterization [50.50023451369742]
Pruning-as-Search (PaS) is an end-to-end channel pruning method to search out desired sub-network automatically and efficiently.
Our proposed architecture outperforms prior arts by around $1.0%$ top-1 accuracy on ImageNet-1000 classification task.
arXiv Detail & Related papers (2022-06-02T17:58:54Z) - Evolutionary Neural Cascade Search across Supernetworks [68.8204255655161]
We introduce ENCAS - Evolutionary Neural Cascade Search.
ENCAS can be used to search over multiple pretrained supernetworks.
We test ENCAS on common computer vision benchmarks.
arXiv Detail & Related papers (2022-03-08T11:06:01Z) - A Hardware-Aware System for Accelerating Deep Neural Network
Optimization [7.189421078452572]
We propose a comprehensive system that automatically and efficiently finds sub-networks from a pre-trained super-network.
By combining novel search tactics and algorithms with intelligent use of predictors, we significantly decrease the time needed to find optimal sub-networks.
arXiv Detail & Related papers (2022-02-25T20:07:29Z) - SuperShaper: Task-Agnostic Super Pre-training of BERT Models with
Variable Hidden Dimensions [2.8583189395674653]
SuperShaper is a task agnostic pre-training approach for NLU models.
It simultaneously pre-trains a large number of Transformer models by varying shapes.
SuperShaper discovers networks that effectively trade-off accuracy and model size.
arXiv Detail & Related papers (2021-10-10T05:44:02Z) - PV-NAS: Practical Neural Architecture Search for Video Recognition [83.77236063613579]
Deep neural networks for video tasks is highly customized and the design of such networks requires domain experts and costly trial and error tests.
Recent advance in network architecture search has boosted the image recognition performance in a large margin.
In this study, we propose a practical solution, namely Practical Video Neural Architecture Search (PV-NAS)
arXiv Detail & Related papers (2020-11-02T08:50:23Z) - Hierarchical Neural Architecture Search for Deep Stereo Matching [131.94481111956853]
We propose the first end-to-end hierarchical NAS framework for deep stereo matching.
Our framework incorporates task-specific human knowledge into the neural architecture search framework.
It is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset.
arXiv Detail & Related papers (2020-10-26T11:57:37Z) - FNA++: Fast Network Adaptation via Parameter Remapping and Architecture
Search [35.61441231491448]
We propose a Fast Network Adaptation (FNA++) method, which can adapt both the architecture and parameters of a seed network.
In our experiments, we apply FNA++ on MobileNetV2 to obtain new networks for semantic segmentation, object detection, and human pose estimation.
The total computation cost of FNA++ is significantly less than SOTA segmentation and detection NAS approaches.
arXiv Detail & Related papers (2020-06-21T10:03:34Z) - Fast Neural Network Adaptation via Parameter Remapping and Architecture
Search [35.61441231491448]
Deep neural networks achieve remarkable performance in many computer vision tasks.
Most state-of-the-art (SOTA) semantic segmentation and object detection approaches reuse neural network architectures designed for image classification as the backbone.
One major challenge though, is that ImageNet pre-training of the search space representation incurs huge computational cost.
In this paper, we propose a Fast Neural Network Adaptation (FNA) method, which can adapt both the architecture and parameters of a seed network.
arXiv Detail & Related papers (2020-01-08T13:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.