Related papers: SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

URL: http://arxiv.org/abs/2303.08308v1
Date: Wed, 15 Mar 2023 01:41:21 GMT
Title: SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference
Authors: Li Lyna Zhang, Xudong Wang, Jiahang Xu, Quanlu Zhang, Yujing Wang, Yuqing Yang, Ningxin Zheng, Ting Cao, Mao Yang
Abstract summary: We propose SpaceEvo, an automatic method for designing a dedicated, quantization-friendly search space for each target hardware. We show that SpaceEvo consistently outperforms existing manually-designed search spaces with up to 2.5x faster speed while achieving the same accuracy.
Score: 15.94147346105013
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The combination of Neural Architecture Search (NAS) and quantization has proven successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN). However, directly applying NAS to design accurate QNN models that achieve low latency on real-world devices leads to inferior performance. In this work, we find that the poor INT8 latency is due to the quantization-unfriendly issue: the operator and configuration (e.g., channel width) choices in prior art search spaces lead to diverse quantization efficiency and can slow down the INT8 inference speed. To address this challenge, we propose SpaceEvo, an automatic method for designing a dedicated, quantization-friendly search space for each target hardware. The key idea of SpaceEvo is to automatically search hardware-preferred operators and configurations to construct the search space, guided by a metric called Q-T score to quantify how quantization-friendly a candidate search space is. We further train a quantized-for-all supernet over our discovered search space, enabling the searched models to be directly deployed without extra retraining or quantization. Our discovered models establish new SOTA INT8 quantized accuracy under various latency constraints, achieving up to 10.1% accuracy improvement on ImageNet than prior art CNNs under the same latency. Extensive experiments on diverse edge devices demonstrate that SpaceEvo consistently outperforms existing manually-designed search spaces with up to 2.5x faster speed while achieving the same accuracy.

Related papers

ISQuant: apply squant to the real deployment [0.0]
We analyze why the combination of quantization and dequantization is used to train the model. We propose ISQuant as a solution for deploying 8-bit models.
arXiv Detail & Related papers (2024-07-05T15:10:05Z)
Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge [3.1878884714257008]
We present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.
arXiv Detail & Related papers (2024-01-22T20:32:31Z)
Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search [10.538869116366415]
We introduce a novel search-space independent NN encoding based on zero-cost proxies that achieves sample-efficient prediction on multiple tasks and NAS search spaces. Our NN encoding enables multi-search-space transfer of latency predictors from NASBench-201 to FBNet in under 85 HW measurements.
arXiv Detail & Related papers (2023-06-04T20:22:14Z)
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search [100.28980854978768]
We present Block-wisely Self-supervised Neural Architecture Search (BossNAS) We factorize the search space into blocks and utilize a novel self-supervised training scheme, named ensemble bootstrapping, to train each block separately. We also present HyTra search space, a fabric-like hybrid CNN-transformer search space with searchable down-sampling positions.
arXiv Detail & Related papers (2021-03-23T10:05:58Z)
AutoSpace: Neural Architecture Search with Less Human Interference [84.42680793945007]
Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction. We propose a novel differentiable evolutionary framework named AutoSpace, which evolves the search space to an optimal one. With the learned search space, the performance of recent NAS algorithms can be improved significantly compared with using previously manually designed spaces.
arXiv Detail & Related papers (2021-03-22T13:28:56Z)
Evolving Search Space for Neural Architecture Search [70.71153433676024]
We present a Neural Search-space Evolution (NSE) scheme that amplifies the results from the previous effort by maintaining an optimized search space subset. We achieve 77.3% top-1 retrain accuracy on ImageNet with 333M FLOPs, which yielded a state-of-the-art performance. When the latency constraint is adopted, our result also performs better than the previous best-performing mobile models with a 77.9% Top-1 retrain accuracy.
arXiv Detail & Related papers (2020-11-22T01:11:19Z)
Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
Learned Low Precision Graph Neural Networks [10.269500440688306]
We show how to systematically quantise Deep Graph Neural Networks (GNNs) with minimal or no loss in performance using Network Architecture Search (NAS) The proposed novel NAS mechanism, named Low Precision Graph NAS (LPGNAS), constrains both architecture and quantisation choices to be differentiable. On eight different datasets, solving the task of classifying unseen nodes in a graph, LPGNAS generates quantised models with significant reductions in both model and buffer sizes.
arXiv Detail & Related papers (2020-09-19T13:51:09Z)
LC-NAS: Latency Constrained Neural Architecture Search for Point Cloud Networks [73.78551758828294]
LC-NAS is able to find state-of-the-art architectures for point cloud classification with minimal computational cost. We show how our searched architectures achieve any desired latency with a reasonably low drop in accuracy.
arXiv Detail & Related papers (2020-08-24T10:30:21Z)
FrostNet: Towards Quantization-Aware Network Architecture Search [8.713741951284886]
We present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances. Our FrostNets achieve higher recognition accuracy than existing CNNs with comparable latency when quantized.
arXiv Detail & Related papers (2020-06-17T06:40:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.