Related papers: Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge

Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge

URL: http://arxiv.org/abs/2401.12350v1
Date: Mon, 22 Jan 2024 20:32:31 GMT
Title: Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge
Authors: Yao Lu, Hiram Rayo Torres Rodriguez, Sebastian Vogel, Nick van de Waterlaat, Pavol Jancura
Abstract summary: We present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.
Score: 3.1878884714257008
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.

Related papers

Delta-NAS: Difference of Architecture Encoding for Predictor-based Evolutionary Neural Architecture Search [5.1331676121360985]
We craft an algorithm with the capability to perform fine-grain NAS at a low cost. We propose projecting the problem to a lower dimensional space through predicting the difference in accuracy of a pair of similar networks.
arXiv Detail & Related papers (2024-11-21T02:43:32Z)
DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques. Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms. Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z)
Lightweight Neural Architecture Search for Temporal Convolutional Networks at the Edge [21.72253397805102]
This work focuses in particular on Temporal Convolutional Networks (TCNs), a convolutional model for time-series processing. We propose the first NAS tool that explicitly targets the optimization of the most peculiar architectural parameters of TCNs. We test the proposed NAS on four real-world, edge-relevant tasks, involving audio and bio-signals.
arXiv Detail & Related papers (2023-01-24T19:47:40Z)
Efficient Architecture Search for Diverse Tasks [29.83517145790238]
We study neural architecture search (NAS) for efficiently solving diverse problems. We introduce DASH, a differentiable NAS algorithm that computes the mixture-of-operations using the Fourier diagonalization of convolution. We evaluate DASH-Bench-360, a suite of ten tasks designed for NAS benchmarking in diverse domains.
arXiv Detail & Related papers (2022-04-15T17:21:27Z)
NAS-FCOS: Efficient Search for Object Detection Architectures [113.47766862146389]
We propose an efficient method to obtain better object detectors by searching for the feature pyramid network (FPN) and the prediction head of a simple anchor-free object detector. With carefully designed search space, search algorithms, and strategies for evaluating network quality, we are able to find top-performing detection architectures within 4 days using 8 V100 GPUs.
arXiv Detail & Related papers (2021-10-24T12:20:04Z)
Searching Efficient Model-guided Deep Network for Image Denoising [61.65776576769698]
We present a novel approach by connecting model-guided design with NAS (MoD-NAS) MoD-NAS employs a highly reusable width search strategy and a densely connected search block to automatically select the operations of each layer. Experimental results on several popular datasets show that our MoD-NAS has achieved even better PSNR performance than current state-of-the-art methods.
arXiv Detail & Related papers (2021-04-06T14:03:01Z)
Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
FrostNet: Towards Quantization-Aware Network Architecture Search [8.713741951284886]
We present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances. Our FrostNets achieve higher recognition accuracy than existing CNNs with comparable latency when quantized.
arXiv Detail & Related papers (2020-06-17T06:40:43Z)
DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search [76.9225014200746]
Efficient search is a core issue in Neural Architecture Search (NAS) We present DA-NAS that can directly search the architecture for large-scale target tasks while allowing a large candidate set in a more efficient manner. It is 2x faster than previous methods while the accuracy is currently state-of-the-art, at 76.2% under small FLOPs constraint.
arXiv Detail & Related papers (2020-03-27T17:55:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.