Related papers: Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach

Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach

URL: http://arxiv.org/abs/2207.10188v1
Date: Wed, 20 Jul 2022 20:39:39 GMT
Title: Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach
Authors: Jiseok Youn, Jaehun Song, Hyung-Sin Kim, Saewoong Bahk
Abstract summary: We propose a meta-learning approach to achieve deep neural network quantization with adaptive bitwidths. MeBQAT allows the (meta-)trained model to be quantized to any candidate bitwidth then helps to conduct inference without much accuracy drop from quantization. We experimentally demonstrate their validity in multiple QAT schemes.
Score: 6.122150357599037
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural network quantization with adaptive bitwidths has gained increasing attention due to the ease of model deployment on various platforms with different resource budgets. In this paper, we propose a meta-learning approach to achieve this goal. Specifically, we propose MEBQAT, a simple yet effective way of bitwidth-adaptive quantization aware training (QAT) where meta-learning is effectively combined with QAT by redefining meta-learning tasks to incorporate bitwidths. After being deployed on a platform, MEBQAT allows the (meta-)trained model to be quantized to any candidate bitwidth then helps to conduct inference without much accuracy drop from quantization. Moreover, with a few-shot learning scenario, MEBQAT can also adapt a model to any bitwidth as well as any unseen target classes by adding conventional optimization or metric-based meta-learning. We design variants of MEBQAT to support both (1) a bitwidth-adaptive quantization scenario and (2) a new few-shot learning scenario where both quantization bitwidths and target classes are jointly adapted. We experimentally demonstrate their validity in multiple QAT schemes. By comparing their performance to (bitwidth-dedicated) QAT, existing bitwidth adaptive QAT and vanilla meta-learning, we find that merging bitwidths into meta-learning tasks achieves a higher level of robustness.

Related papers

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models [50.525259103219256]
quantization-aware training (QAT) offers a solution by reducing memory consumption through low-bit representations with minimal accuracy loss. We propose Efficient Quantization-Aware Training (EfficientQAT), a more feasible QAT algorithm. EfficientQAT involves two consecutive phases: Block-wise training of all parameters (Block-AP) and end-to-end training of quantization parameters (E2E-QP)
arXiv Detail & Related papers (2024-07-10T17:53:30Z)
AdaQAT: Adaptive Bit-Width Quantization-Aware Training [0.873811641236639]
Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. We present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimize bit-widths during training for more efficient inference.
arXiv Detail & Related papers (2024-04-22T09:23:56Z)
Learning to Learn with Indispensable Connections [6.040904021861969]
We propose a novel meta-learning method called Meta-LTH that includes indispensible (necessary) connections. Our method improves the classification accuracy by approximately 2% (20-way 1-shot task setting) for omniglot dataset.
arXiv Detail & Related papers (2023-04-06T04:53:13Z)
Data Quality-aware Mixed-precision Quantization via Hybrid Reinforcement Learning [22.31766292657812]
Mixed-precision quantization mostly predetermines the model bit-width settings before actual training. We propose a novel Data Quality-aware Mixed-precision Quantization framework, dubbed DQMQ, to dynamically adapt quantization bit-widths to different data qualities.
arXiv Detail & Related papers (2023-02-09T06:14:00Z)
Vertical Layering of Quantized Neural Networks for Heterogeneous Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z)
SDQ: Stochastic Differentiable Quantization with Mixed Precision [46.232003346732064]
We present a novel Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy. After the optimal MPQ strategy is acquired, we train our network with entropy-aware bin regularization and knowledge distillation. SDQ outperforms all state-of-the-art mixed datasets or single precision quantization with a lower bitwidth.
arXiv Detail & Related papers (2022-06-09T12:38:18Z)
Online Meta Adaptation for Variable-Rate Learned Image Compression [40.8361915315201]
This work addresses two major issues of end-to-end learned image compression (LIC) based on deep neural networks. We introduce an online meta-learning (OML) setting for LIC, which combines ideas from meta learning and online learning in the conditional variational auto-encoder framework.
arXiv Detail & Related papers (2021-11-16T06:46:23Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
Bit-Mixer: Mixed-precision networks with runtime bit-width selection [72.32693989093558]
Bit-Mixer is the first method to train a meta-quantized network where during test time any layer can change its bid-width without affecting the overall network's ability for highly accurate inference. We show that our method can result in mixed precision networks that exhibit the desirable flexibility properties for on-device deployment without compromising accuracy.
arXiv Detail & Related papers (2021-03-31T17:58:47Z)
All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network. Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student. To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.