Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search
- URL: http://arxiv.org/abs/2010.04354v3
- Date: Tue, 28 Sep 2021 06:53:15 GMT
- Title: Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search
- Authors: Mingzhu Shen, Feng Liang, Ruihao Gong, Yuhang Li, Chuming Li, Chen
Lin, Fengwei Yu, Junjie Yan, Wanli Ouyang
- Abstract summary: We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
- Score: 112.05977301976613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantization Neural Networks (QNN) have attracted a lot of attention due to
their high efficiency. To enhance the quantization accuracy, prior works mainly
focus on designing advanced quantization algorithms but still fail to achieve
satisfactory results under the extremely low-bit case. In this work, we take an
architecture perspective to investigate the potential of high-performance QNN.
Therefore, we propose to combine Network Architecture Search methods with
quantization to enjoy the merits of the two sides. However, a naive combination
inevitably faces unacceptable time consumption or unstable training problem. To
alleviate these problems, we first propose the joint training of architecture
and quantization with a shared step size to acquire a large number of quantized
models. Then a bit-inheritance scheme is introduced to transfer the quantized
models to the lower bit, which further reduces the time cost and meanwhile
improves the quantization accuracy. Equipped with this overall framework,
dubbed as Once Quantization-Aware Training~(OQAT), our searched model family,
OQATNets, achieves a new state-of-the-art compared with various architectures
under different bit-widths. In particular, OQAT-2bit-M achieves 61.6% ImageNet
Top-1 accuracy, outperforming 2-bit counterpart MobileNetV3 by a large margin
of 9% with 10% less computation cost. A series of quantization-friendly
architectures are identified easily and extensive analysis can be made to
summarize the interaction between quantization and neural architectures. Codes
and models are released at https://github.com/LaVieEnRoseSMZ/OQA
Related papers
- ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs [15.43153209571646]
Mixed precision quantization has become an important technique for enabling the execution of deep neural networks (DNNs) on limited resource computing platforms.
This paper introduces ARQ, an innovative mixed-precision quantization method that not only preserves the clean accuracy of the smoothed classifiers but also maintains their certified robustness.
arXiv Detail & Related papers (2024-10-31T17:59:37Z) - AdaQAT: Adaptive Bit-Width Quantization-Aware Training [0.873811641236639]
Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios.
Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging.
We present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimize bit-widths during training for more efficient inference.
arXiv Detail & Related papers (2024-04-22T09:23:56Z) - SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks [1.0923877073891446]
Spiking neural networks (SNNs) share the goal of enhancing efficiency, but adopt an 'event-driven' approach to reduce the power consumption of neural network inference.
This paper introduces two QAT schemes for stateful neurons: (i) a uniform quantization strategy, an established method for weight quantization, and (ii) threshold-centered quantization.
Our results show that increasing the density of quantization levels around the firing threshold improves accuracy across several benchmark datasets.
arXiv Detail & Related papers (2024-04-15T03:07:16Z) - Modular Quantization-Aware Training for 6D Object Pose Estimation [52.9436648014338]
Edge applications demand efficient 6D object pose estimation on resource-constrained embedded platforms.
We introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy.
MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.
arXiv Detail & Related papers (2023-03-12T21:01:54Z) - GHN-Q: Parameter Prediction for Unseen Quantized Convolutional
Architectures via Graph Hypernetworks [80.29667394618625]
We conduct the first-ever study exploring the use of graph hypernetworks for predicting parameters of unseen quantized CNN architectures.
We focus on a reduced CNN search space and find that GHN-Q can in fact predict quantization-robust parameters for various 8-bit quantized CNNs.
arXiv Detail & Related papers (2022-08-26T08:00:02Z) - Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism.
We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z) - OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization.
We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming.
This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Training Multi-bit Quantized and Binarized Networks with A Learnable
Symmetric Quantizer [1.9659095632676098]
Quantizing weights and activations of deep neural networks is essential for deploying them in resource-constrained devices or cloud platforms.
While binarization is a special case of quantization, this extreme case often leads to several training difficulties.
We develop a unified quantization framework, denoted as UniQ, to overcome binarization difficulties.
arXiv Detail & Related papers (2021-04-01T02:33:31Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.