Scaling Up Quantization-Aware Neural Architecture Search for Efficient
Deep Learning on the Edge
- URL: http://arxiv.org/abs/2401.12350v1
- Date: Mon, 22 Jan 2024 20:32:31 GMT
- Title: Scaling Up Quantization-Aware Neural Architecture Search for Efficient
Deep Learning on the Edge
- Authors: Yao Lu, Hiram Rayo Torres Rodriguez, Sebastian Vogel, Nick van de
Waterlaat, Pavol Jancura
- Abstract summary: We present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS.
We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.
- Score: 3.1878884714257008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Architecture Search (NAS) has become the de-facto approach for
designing accurate and efficient networks for edge devices. Since models are
typically quantized for edge deployment, recent work has investigated
quantization-aware NAS (QA-NAS) to search for highly accurate and efficient
quantized models. However, existing QA-NAS approaches, particularly few-bit
mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently,
QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this
work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale
tasks by leveraging the block-wise formulation introduced by block-wise NAS. We
demonstrate strong results for the semantic segmentation task on the Cityscapes
dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than
DeepLabV3 (INT8) without compromising task performance.
Related papers
- Delta-NAS: Difference of Architecture Encoding for Predictor-based Evolutionary Neural Architecture Search [5.1331676121360985]
We craft an algorithm with the capability to perform fine-grain NAS at a low cost.
We propose projecting the problem to a lower dimensional space through predicting the difference in accuracy of a pair of similar networks.
arXiv Detail & Related papers (2024-11-21T02:43:32Z) - DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques.
Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms.
Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z) - Lightweight Neural Architecture Search for Temporal Convolutional
Networks at the Edge [21.72253397805102]
This work focuses in particular on Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing.
We propose the first NAS tool that explicitly targets the optimization of the most peculiar architectural parameters of TCNs.
We test the proposed NAS on four real-world, edge-relevant tasks, involving audio and bio-signals.
arXiv Detail & Related papers (2023-01-24T19:47:40Z) - Efficient Architecture Search for Diverse Tasks [29.83517145790238]
We study neural architecture search (NAS) for efficiently solving diverse problems.
We introduce DASH, a differentiable NAS algorithm that computes the mixture-of-operations using the Fourier diagonalization of convolution.
We evaluate DASH-Bench-360, a suite of ten tasks designed for NAS benchmarking in diverse domains.
arXiv Detail & Related papers (2022-04-15T17:21:27Z) - NAS-FCOS: Efficient Search for Object Detection Architectures [113.47766862146389]
We propose an efficient method to obtain better object detectors by searching for the feature pyramid network (FPN) and the prediction head of a simple anchor-free object detector.
With carefully designed search space, search algorithms, and strategies for evaluating network quality, we are able to find top-performing detection architectures within 4 days using 8 V100 GPUs.
arXiv Detail & Related papers (2021-10-24T12:20:04Z) - Searching Efficient Model-guided Deep Network for Image Denoising [61.65776576769698]
We present a novel approach by connecting model-guided design with NAS (MoD-NAS)
MoD-NAS employs a highly reusable width search strategy and a densely connected search block to automatically select the operations of each layer.
Experimental results on several popular datasets show that our MoD-NAS has achieved even better PSNR performance than current state-of-the-art methods.
arXiv Detail & Related papers (2021-04-06T14:03:01Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - FrostNet: Towards Quantization-Aware Network Architecture Search [8.713741951284886]
We present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances.
Our FrostNets achieve higher recognition accuracy than existing CNNs with comparable latency when quantized.
arXiv Detail & Related papers (2020-06-17T06:40:43Z) - DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search [76.9225014200746]
Efficient search is a core issue in Neural Architecture Search (NAS)
We present DA-NAS that can directly search the architecture for large-scale target tasks while allowing a large candidate set in a more efficient manner.
It is 2x faster than previous methods while the accuracy is currently state-of-the-art, at 76.2% under small FLOPs constraint.
arXiv Detail & Related papers (2020-03-27T17:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.