Effective and Fast: A Novel Sequential Single Path Search for
Mixed-Precision Quantization
- URL: http://arxiv.org/abs/2103.02904v1
- Date: Thu, 4 Mar 2021 09:15:08 GMT
- Title: Effective and Fast: A Novel Sequential Single Path Search for
Mixed-Precision Quantization
- Authors: Qigong Sun, Licheng Jiao, Yan Ren, Xiufang Li, Fanhua Shang, Fang Liu
- Abstract summary: Mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance.
It is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints.
We propose a novel sequential single path search (SSPS) method for mixed-precision quantization.
- Score: 45.22093693422085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since model quantization helps to reduce the model size and computation
latency, it has been successfully applied in many applications of mobile
phones, embedded devices and smart chips. The mixed-precision quantization
model can match different quantization bit-precisions according to the
sensitivity of different layers to achieve great performance. However, it is a
difficult problem to quickly determine the quantization bit-precision of each
layer in deep neural networks according to some constraints (e.g., hardware
resources, energy consumption, model size and computation latency). To address
this issue, we propose a novel sequential single path search (SSPS) method for
mixed-precision quantization,in which the given constraints are introduced into
its loss function to guide searching process. A single path search cell is used
to combine a fully differentiable supernet, which can be optimized by
gradient-based algorithms. Moreover, we sequentially determine the candidate
precisions according to the selection certainties to exponentially reduce the
search space and speed up the convergence of searching process. Experiments
show that our method can efficiently search the mixed-precision models for
different architectures (e.g., ResNet-20, 18, 34, 50 and MobileNet-V2) and
datasets (e.g., CIFAR-10, ImageNet and COCO) under given constraints, and our
experimental results verify that SSPS significantly outperforms their uniform
counterparts.
Related papers
- Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference.
We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy.
Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z) - FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search [50.07268323597872]
We propose the first one-shot mixed-precision quantization search that eliminates the need for retraining in both integer and low-precision floating point models.
With integer models, we increase the accuracy of ResNet-18 on ImageNet by 1.31% and ResNet-50 by 0.90% with equivalent model cost over previous methods.
For the first time, we explore a novel mixed-precision floating-point search and improve MobileNetV2 by up to 0.98% compared to prior state-of-the-art FP8 models.
arXiv Detail & Related papers (2023-08-07T04:17:19Z) - Free Bits: Latency Optimization of Mixed-Precision Quantized Neural
Networks on the Edge [17.277918711842457]
Mixed-precision quantization offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy.
This paper proposes a hybrid search methodology to navigate the search space of mixed-precision configurations for a given network.
It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware optimization to find mixed-precision configurations latency-optimized for a specific hardware target.
arXiv Detail & Related papers (2023-07-06T09:57:48Z) - QuantNAS for super resolution: searching for efficient
quantization-friendly architectures against quantization noise [19.897685398009912]
We propose a novel quantization-aware procedure, the QuantNAS.
We use entropy regularization, quantization noise, and Adaptive Deviation for Quantization (ADQ) module to enhance the search procedure.
The proposed procedure is 30% faster than direct weight quantization and is more stable.
arXiv Detail & Related papers (2022-08-31T13:12:16Z) - SDQ: Stochastic Differentiable Quantization with Mixed Precision [46.232003346732064]
We present a novel Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy.
After the optimal MPQ strategy is acquired, we train our network with entropy-aware bin regularization and knowledge distillation.
SDQ outperforms all state-of-the-art mixed datasets or single precision quantization with a lower bitwidth.
arXiv Detail & Related papers (2022-06-09T12:38:18Z) - BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network
Quantization [32.770842274996774]
Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks.
Previous methods either examine only a small manually-designed search space or utilize a cumbersome neural architecture search to explore the vast search space.
This work proposes bit-level sparsity quantization (BSQ) to tackle the mixed-precision quantization from a new angle of inducing bit-level sparsity.
arXiv Detail & Related papers (2021-02-20T22:37:41Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z) - Finding Non-Uniform Quantization Schemes using Multi-Task Gaussian
Processes [12.798516310559375]
We show that with significantly lower precision in the last layers we achieve a minimal loss of accuracy with appreciable memory savings.
We test our findings on the CIFAR10 and ImageNet datasets using the VGG, ResNet and GoogLeNet architectures.
arXiv Detail & Related papers (2020-07-15T15:16:18Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - Rethinking Differentiable Search for Mixed-Precision Neural Networks [83.55785779504868]
Low-precision networks with weights and activations quantized to low bit-width are widely used to accelerate inference on edge devices.
Current solutions are uniform, using identical bit-width for all filters.
This fails to account for the different sensitivities of different filters and is suboptimal.
Mixed-precision networks address this problem, by tuning the bit-width to individual filter requirements.
arXiv Detail & Related papers (2020-04-13T07:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.