Bitwidth-Adaptive Quantization-Aware Neural Network Training: A
Meta-Learning Approach
- URL: http://arxiv.org/abs/2207.10188v1
- Date: Wed, 20 Jul 2022 20:39:39 GMT
- Title: Bitwidth-Adaptive Quantization-Aware Neural Network Training: A
Meta-Learning Approach
- Authors: Jiseok Youn, Jaehun Song, Hyung-Sin Kim, Saewoong Bahk
- Abstract summary: We propose a meta-learning approach to achieve deep neural network quantization with adaptive bitwidths.
MeBQAT allows the (meta-)trained model to be quantized to any candidate bitwidth then helps to conduct inference without much accuracy drop from quantization.
We experimentally demonstrate their validity in multiple QAT schemes.
- Score: 6.122150357599037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural network quantization with adaptive bitwidths has gained
increasing attention due to the ease of model deployment on various platforms
with different resource budgets. In this paper, we propose a meta-learning
approach to achieve this goal. Specifically, we propose MEBQAT, a simple yet
effective way of bitwidth-adaptive quantization aware training (QAT) where
meta-learning is effectively combined with QAT by redefining meta-learning
tasks to incorporate bitwidths. After being deployed on a platform, MEBQAT
allows the (meta-)trained model to be quantized to any candidate bitwidth then
helps to conduct inference without much accuracy drop from quantization.
Moreover, with a few-shot learning scenario, MEBQAT can also adapt a model to
any bitwidth as well as any unseen target classes by adding conventional
optimization or metric-based meta-learning. We design variants of MEBQAT to
support both (1) a bitwidth-adaptive quantization scenario and (2) a new
few-shot learning scenario where both quantization bitwidths and target classes
are jointly adapted. We experimentally demonstrate their validity in multiple
QAT schemes. By comparing their performance to (bitwidth-dedicated) QAT,
existing bitwidth adaptive QAT and vanilla meta-learning, we find that merging
bitwidths into meta-learning tasks achieves a higher level of robustness.
Related papers
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models [50.525259103219256]
quantization-aware training (QAT) offers a solution by reducing memory consumption through low-bit representations with minimal accuracy loss.
We propose Efficient Quantization-Aware Training (EfficientQAT), a more feasible QAT algorithm.
EfficientQAT involves two consecutive phases: Block-wise training of all parameters (Block-AP) and end-to-end training of quantization parameters (E2E-QP)
arXiv Detail & Related papers (2024-07-10T17:53:30Z) - AdaQAT: Adaptive Bit-Width Quantization-Aware Training [0.873811641236639]
Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios.
Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging.
We present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimize bit-widths during training for more efficient inference.
arXiv Detail & Related papers (2024-04-22T09:23:56Z) - Learning to Learn with Indispensable Connections [6.040904021861969]
We propose a novel meta-learning method called Meta-LTH that includes indispensible (necessary) connections.
Our method improves the classification accuracy by approximately 2% (20-way 1-shot task setting) for omniglot dataset.
arXiv Detail & Related papers (2023-04-06T04:53:13Z) - Data Quality-aware Mixed-precision Quantization via Hybrid Reinforcement
Learning [22.31766292657812]
Mixed-precision quantization mostly predetermines the model bit-width settings before actual training.
We propose a novel Data Quality-aware Mixed-precision Quantization framework, dubbed DQMQ, to dynamically adapt quantization bit-widths to different data qualities.
arXiv Detail & Related papers (2023-02-09T06:14:00Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - SDQ: Stochastic Differentiable Quantization with Mixed Precision [46.232003346732064]
We present a novel Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy.
After the optimal MPQ strategy is acquired, we train our network with entropy-aware bin regularization and knowledge distillation.
SDQ outperforms all state-of-the-art mixed datasets or single precision quantization with a lower bitwidth.
arXiv Detail & Related papers (2022-06-09T12:38:18Z) - Online Meta Adaptation for Variable-Rate Learned Image Compression [40.8361915315201]
This work addresses two major issues of end-to-end learned image compression (LIC) based on deep neural networks.
We introduce an online meta-learning (OML) setting for LIC, which combines ideas from meta learning and online learning in the conditional variational auto-encoder framework.
arXiv Detail & Related papers (2021-11-16T06:46:23Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Bit-Mixer: Mixed-precision networks with runtime bit-width selection [72.32693989093558]
Bit-Mixer is the first method to train a meta-quantized network where during test time any layer can change its bid-width without affecting the overall network's ability for highly accurate inference.
We show that our method can result in mixed precision networks that exhibit the desirable flexibility properties for on-device deployment without compromising accuracy.
arXiv Detail & Related papers (2021-03-31T17:58:47Z) - All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network.
Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student.
To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.