SDQ: Stochastic Differentiable Quantization with Mixed Precision
- URL: http://arxiv.org/abs/2206.04459v1
- Date: Thu, 9 Jun 2022 12:38:18 GMT
- Title: SDQ: Stochastic Differentiable Quantization with Mixed Precision
- Authors: Xijie Huang, Zhiqiang Shen, Shichao Li, Zechun Liu, Xianghong Hu,
Jeffry Wicaksana, Eric Xing, Kwang-Ting Cheng
- Abstract summary: We present a novel Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy.
After the optimal MPQ strategy is acquired, we train our network with entropy-aware bin regularization and knowledge distillation.
SDQ outperforms all state-of-the-art mixed datasets or single precision quantization with a lower bitwidth.
- Score: 46.232003346732064
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In order to deploy deep models in a computationally efficient manner, model
quantization approaches have been frequently used. In addition, as new hardware
that supports mixed bitwidth arithmetic operations, recent research on mixed
precision quantization (MPQ) begins to fully leverage the capacity of
representation by searching optimized bitwidths for different layers and
modules in a network. However, previous studies mainly search the MPQ strategy
in a costly scheme using reinforcement learning, neural architecture search,
etc., or simply utilize partial prior knowledge for bitwidth assignment, which
might be biased and sub-optimal. In this work, we present a novel Stochastic
Differentiable Quantization (SDQ) method that can automatically learn the MPQ
strategy in a more flexible and globally-optimized space with smoother gradient
approximation. Particularly, Differentiable Bitwidth Parameters (DBPs) are
employed as the probability factors in stochastic quantization between adjacent
bitwidth choices. After the optimal MPQ strategy is acquired, we further train
our network with entropy-aware bin regularization and knowledge distillation.
We extensively evaluate our method for several networks on different hardware
(GPUs and FPGA) and datasets. SDQ outperforms all state-of-the-art mixed or
single precision quantization with a lower bitwidth and is even better than the
full-precision counterparts across various ResNet and MobileNet families,
demonstrating the effectiveness and superiority of our method.
Related papers
- Coverage Analysis for Digital Cousin Selection -- Improving Multi-Environment Q-Learning [24.212773534280387]
Recent advancements include multi-environment mixed Q-learning (MEMQ) algorithms.
MEMQ algorithms outperform several state-of-the-art Q-learning algorithms in terms of accuracy, complexity, and robustness.
We present a novel CC-based MEMQ algorithm to improve the accuracy and complexity of existing MEMQ algorithms.
arXiv Detail & Related papers (2024-11-13T06:16:12Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Mixed-Precision Quantization with Cross-Layer Dependencies [6.338965603383983]
Mixed-precision quantization (MPQ) assigns varied bit-widths to layers to optimize the accuracy-efficiency trade-off.
Existing methods simplify the MPQ problem by assuming that quantization errors at different layers act independently.
We show that this assumption does not reflect the true behavior of quantized deep neural networks.
arXiv Detail & Related papers (2023-07-11T15:56:00Z) - Data Quality-aware Mixed-precision Quantization via Hybrid Reinforcement
Learning [22.31766292657812]
Mixed-precision quantization mostly predetermines the model bit-width settings before actual training.
We propose a novel Data Quality-aware Mixed-precision Quantization framework, dubbed DQMQ, to dynamically adapt quantization bit-widths to different data qualities.
arXiv Detail & Related papers (2023-02-09T06:14:00Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Effective and Fast: A Novel Sequential Single Path Search for
Mixed-Precision Quantization [45.22093693422085]
Mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance.
It is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints.
We propose a novel sequential single path search (SSPS) method for mixed-precision quantization.
arXiv Detail & Related papers (2021-03-04T09:15:08Z) - All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network.
Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student.
To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z) - BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network
Quantization [32.770842274996774]
Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks.
Previous methods either examine only a small manually-designed search space or utilize a cumbersome neural architecture search to explore the vast search space.
This work proposes bit-level sparsity quantization (BSQ) to tackle the mixed-precision quantization from a new angle of inducing bit-level sparsity.
arXiv Detail & Related papers (2021-02-20T22:37:41Z) - APQ: Joint Search for Network Architecture, Pruning and Quantization
Policy [49.3037538647714]
We present APQ for efficient deep learning inference on resource-constrained hardware.
Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner.
With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ.
arXiv Detail & Related papers (2020-06-15T16:09:17Z) - Rethinking Differentiable Search for Mixed-Precision Neural Networks [83.55785779504868]
Low-precision networks with weights and activations quantized to low bit-width are widely used to accelerate inference on edge devices.
Current solutions are uniform, using identical bit-width for all filters.
This fails to account for the different sensitivities of different filters and is suboptimal.
Mixed-precision networks address this problem, by tuning the bit-width to individual filter requirements.
arXiv Detail & Related papers (2020-04-13T07:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.