Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and
Adaptive Inference Approach
- URL: http://arxiv.org/abs/2204.09992v1
- Date: Thu, 21 Apr 2022 09:36:43 GMT
- Title: Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and
Adaptive Inference Approach
- Authors: Chen Tang, Haoyu Zhai, Kai Ouyang, Zhi Wang, Yifei Zhu, Wenwu Zhu
- Abstract summary: We propose to feed different data samples with varying quantization schemes to achieve a data-dependent dynamic inference, at a fine-grained layer level.
We present the Arbitrary Bit-width Network (ABN), where the bit-widths of a single deep network can change at runtime for different data samples, with a layer-wise granularity.
On ImageNet classification, we achieve 1.1% top1 accuracy improvement while saving 36.2% BitOps.
- Score: 38.03309300383544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional model quantization methods use a fixed quantization scheme to
different data samples, which ignores the inherent "recognition difficulty"
differences between various samples. We propose to feed different data samples
with varying quantization schemes to achieve a data-dependent dynamic
inference, at a fine-grained layer level. However, enabling this adaptive
inference with changeable layer-wise quantization schemes is challenging
because the combination of bit-widths and layers is growing exponentially,
making it extremely difficult to train a single model in such a vast searching
space and use it in practice. To solve this problem, we present the Arbitrary
Bit-width Network (ABN), where the bit-widths of a single deep network can
change at runtime for different data samples, with a layer-wise granularity.
Specifically, first we build a weight-shared layer-wise quantizable
"super-network" in which each layer can be allocated with multiple bit-widths
and thus quantized differently on demand. The super-network provides a
considerably large number of combinations of bit-widths and layers, each of
which can be used during inference without retraining or storing myriad models.
Second, based on the well-trained super-network, each layer's runtime bit-width
selection decision is modeled as a Markov Decision Process (MDP) and solved by
an adaptive inference strategy accordingly. Experiments show that the
super-network can be built without accuracy degradation, and the bit-widths
allocation of each layer can be adjusted to deal with various inputs on the
fly. On ImageNet classification, we achieve 1.1% top1 accuracy improvement
while saving 36.2% BitOps.
Related papers
- A Practical Mixed Precision Algorithm for Post-Training Quantization [15.391257986051249]
Mixed-precision quantization is a promising solution to find a better performance-efficiency trade-off than homogeneous quantization.
We present a simple post-training mixed precision algorithm that only requires a small unlabeled calibration dataset.
We show that we can find mixed precision networks that provide a better trade-off between accuracy and efficiency than their homogeneous bit-width equivalents.
arXiv Detail & Related papers (2023-02-10T17:47:54Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Layer Ensembles [95.42181254494287]
We introduce a method for uncertainty estimation that considers a set of independent categorical distributions for each layer of the network.
We show that the method can be further improved by ranking samples, resulting in models that require less memory and time to run.
arXiv Detail & Related papers (2022-10-10T17:52:47Z) - SDQ: Stochastic Differentiable Quantization with Mixed Precision [46.232003346732064]
We present a novel Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy.
After the optimal MPQ strategy is acquired, we train our network with entropy-aware bin regularization and knowledge distillation.
SDQ outperforms all state-of-the-art mixed datasets or single precision quantization with a lower bitwidth.
arXiv Detail & Related papers (2022-06-09T12:38:18Z) - Gated recurrent units and temporal convolutional network for multilabel
classification [122.84638446560663]
This work proposes a new ensemble method for managing multilabel classification.
The core of the proposed approach combines a set of gated recurrent units and temporal convolutional neural networks trained with variants of the Adam gradients optimization approach.
arXiv Detail & Related papers (2021-10-09T00:00:16Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network.
Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student.
To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z) - A Greedy Algorithm for Quantizing Neural Networks [4.683806391173103]
We propose a new computationally efficient method for quantizing the weights of pre- trained neural networks.
Our method deterministically quantizes layers in an iterative fashion with no complicated re-training required.
arXiv Detail & Related papers (2020-10-29T22:53:10Z) - WaveQ: Gradient-Based Deep Quantization of Neural Networks through
Sinusoidal Adaptive Regularization [8.153944203144988]
We propose a novel sinusoidal regularization, called SINAREQ, for deep quantized training.
We show how SINAREQ balance compute efficiency and accuracy, and provide a heterogeneous bitwidth assignment for quantization of a large variety of deep networks.
arXiv Detail & Related papers (2020-02-29T01:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.