SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8
Inference
- URL: http://arxiv.org/abs/2303.08308v1
- Date: Wed, 15 Mar 2023 01:41:21 GMT
- Title: SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8
Inference
- Authors: Li Lyna Zhang, Xudong Wang, Jiahang Xu, Quanlu Zhang, Yujing Wang,
Yuqing Yang, Ningxin Zheng, Ting Cao, Mao Yang
- Abstract summary: We propose SpaceEvo, an automatic method for designing a dedicated, quantization-friendly search space for each target hardware.
We show that SpaceEvo consistently outperforms existing manually-designed search spaces with up to 2.5x faster speed while achieving the same accuracy.
- Score: 15.94147346105013
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The combination of Neural Architecture Search (NAS) and quantization has
proven successful in automatically designing low-FLOPs INT8 quantized neural
networks (QNN). However, directly applying NAS to design accurate QNN models
that achieve low latency on real-world devices leads to inferior performance.
In this work, we find that the poor INT8 latency is due to the
quantization-unfriendly issue: the operator and configuration (e.g., channel
width) choices in prior art search spaces lead to diverse quantization
efficiency and can slow down the INT8 inference speed. To address this
challenge, we propose SpaceEvo, an automatic method for designing a dedicated,
quantization-friendly search space for each target hardware. The key idea of
SpaceEvo is to automatically search hardware-preferred operators and
configurations to construct the search space, guided by a metric called Q-T
score to quantify how quantization-friendly a candidate search space is. We
further train a quantized-for-all supernet over our discovered search space,
enabling the searched models to be directly deployed without extra retraining
or quantization. Our discovered models establish new SOTA INT8 quantized
accuracy under various latency constraints, achieving up to 10.1% accuracy
improvement on ImageNet than prior art CNNs under the same latency. Extensive
experiments on diverse edge devices demonstrate that SpaceEvo consistently
outperforms existing manually-designed search spaces with up to 2.5x faster
speed while achieving the same accuracy.
Related papers
- ISQuant: apply squant to the real deployment [0.0]
We analyze why the combination of quantization and dequantization is used to train the model.
We propose ISQuant as a solution for deploying 8-bit models.
arXiv Detail & Related papers (2024-07-05T15:10:05Z) - Scaling Up Quantization-Aware Neural Architecture Search for Efficient
Deep Learning on the Edge [3.1878884714257008]
We present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS.
We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.
arXiv Detail & Related papers (2024-01-22T20:32:31Z) - Multi-Predict: Few Shot Predictors For Efficient Neural Architecture
Search [10.538869116366415]
We introduce a novel search-space independent NN encoding based on zero-cost proxies that achieves sample-efficient prediction on multiple tasks and NAS search spaces.
Our NN encoding enables multi-search-space transfer of latency predictors from NASBench-201 to FBNet in under 85 HW measurements.
arXiv Detail & Related papers (2023-06-04T20:22:14Z) - BossNAS: Exploring Hybrid CNN-transformers with Block-wisely
Self-supervised Neural Architecture Search [100.28980854978768]
We present Block-wisely Self-supervised Neural Architecture Search (BossNAS)
We factorize the search space into blocks and utilize a novel self-supervised training scheme, named ensemble bootstrapping, to train each block separately.
We also present HyTra search space, a fabric-like hybrid CNN-transformer search space with searchable down-sampling positions.
arXiv Detail & Related papers (2021-03-23T10:05:58Z) - AutoSpace: Neural Architecture Search with Less Human Interference [84.42680793945007]
Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction.
We propose a novel differentiable evolutionary framework named AutoSpace, which evolves the search space to an optimal one.
With the learned search space, the performance of recent NAS algorithms can be improved significantly compared with using previously manually designed spaces.
arXiv Detail & Related papers (2021-03-22T13:28:56Z) - Evolving Search Space for Neural Architecture Search [70.71153433676024]
We present a Neural Search-space Evolution (NSE) scheme that amplifies the results from the previous effort by maintaining an optimized search space subset.
We achieve 77.3% top-1 retrain accuracy on ImageNet with 333M FLOPs, which yielded a state-of-the-art performance.
When the latency constraint is adopted, our result also performs better than the previous best-performing mobile models with a 77.9% Top-1 retrain accuracy.
arXiv Detail & Related papers (2020-11-22T01:11:19Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Learned Low Precision Graph Neural Networks [10.269500440688306]
We show how to systematically quantise Deep Graph Neural Networks (GNNs) with minimal or no loss in performance using Network Architecture Search (NAS)
The proposed novel NAS mechanism, named Low Precision Graph NAS (LPGNAS), constrains both architecture and quantisation choices to be differentiable.
On eight different datasets, solving the task of classifying unseen nodes in a graph, LPGNAS generates quantised models with significant reductions in both model and buffer sizes.
arXiv Detail & Related papers (2020-09-19T13:51:09Z) - LC-NAS: Latency Constrained Neural Architecture Search for Point Cloud
Networks [73.78551758828294]
LC-NAS is able to find state-of-the-art architectures for point cloud classification with minimal computational cost.
We show how our searched architectures achieve any desired latency with a reasonably low drop in accuracy.
arXiv Detail & Related papers (2020-08-24T10:30:21Z) - FrostNet: Towards Quantization-Aware Network Architecture Search [8.713741951284886]
We present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances.
Our FrostNets achieve higher recognition accuracy than existing CNNs with comparable latency when quantized.
arXiv Detail & Related papers (2020-06-17T06:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.