Automatic low-bit hybrid quantization of neural networks through meta
learning
- URL: http://arxiv.org/abs/2004.11506v1
- Date: Fri, 24 Apr 2020 02:01:26 GMT
- Title: Automatic low-bit hybrid quantization of neural networks through meta
learning
- Authors: Tao Wang, Junsong Wang, Chang Xu and Chao Xue
- Abstract summary: We employ the meta learning method to automatically realize low-bit hybrid quantization of neural networks.
A MetaQuantNet, together with a Quantization function, are trained to generate the quantized weights for the target DNN.
With the best searched quantization policy, we subsequently retrain or finetune to further improve the performance of the quantized target network.
- Score: 22.81983466720024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model quantization is a widely used technique to compress and accelerate deep
neural network (DNN) inference, especially when deploying to edge or IoT
devices with limited computation capacity and power consumption budget. The
uniform bit width quantization across all the layers is usually sub-optimal and
the exploration of hybrid quantization for different layers is vital for
efficient deep compression. In this paper, we employ the meta learning method
to automatically realize low-bit hybrid quantization of neural networks. A
MetaQuantNet, together with a Quantization function, are trained to generate
the quantized weights for the target DNN. Then, we apply a genetic algorithm to
search the best hybrid quantization policy that meets compression constraints.
With the best searched quantization policy, we subsequently retrain or finetune
to further improve the performance of the quantized target network. Extensive
experiments demonstrate the performance of searched hybrid quantization scheme
surpass that of uniform bitwidth counterpart. Compared to the existing
reinforcement learning (RL) based hybrid quantization search approach that
relies on tedious explorations, our meta learning approach is more efficient
and effective for any compression requirements since the MetaQuantNet only
needs be trained once.
Related papers
- A generic and robust quantum agent inspired by deep meta-reinforcement learning [4.881040823544883]
We develop a new training algorithm inspired by the deep meta-reinforcement learning (deep meta-RL)
The trained neural network is adaptive and robust.
Our algorithm can also automatically adjust the number of pulses required to generate the target gate.
arXiv Detail & Related papers (2024-06-11T13:04:30Z) - A Quantum-Classical Collaborative Training Architecture Based on Quantum
State Fidelity [50.387179833629254]
We introduce a collaborative classical-quantum architecture called co-TenQu.
Co-TenQu enhances a classical deep neural network by up to 41.72% in a fair setting.
It outperforms other quantum-based methods by up to 1.9 times and achieves similar accuracy while utilizing 70.59% fewer qubits.
arXiv Detail & Related papers (2024-02-23T14:09:41Z) - Quantization Aware Factorization for Deep Neural Network Compression [20.04951101799232]
decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.
A conventional post-training quantization approach applied to networks with weights yields a drop in accuracy.
This motivated us to develop an algorithm that finds decomposed approximation directly with quantized factors.
arXiv Detail & Related papers (2023-08-08T21:38:02Z) - Quantization-aware Interval Bound Propagation for Training Certifiably
Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs)
Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization.
We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z) - Quantum Robustness Verification: A Hybrid Quantum-Classical Neural
Network Certification Algorithm [1.439946676159516]
In this work, we investigate the verification of ReLU networks, which involves solving a robustness many-variable mixed-integer programs (MIPs)
To alleviate this issue, we propose to use QC for neural network verification and introduce a hybrid quantum procedure to compute provable certificates.
We show that, in a simulated environment, our certificate is sound, and provide bounds on the minimum number of qubits necessary to approximate the problem.
arXiv Detail & Related papers (2022-05-02T13:23:56Z) - Optimizing Tensor Network Contraction Using Reinforcement Learning [86.05566365115729]
We propose a Reinforcement Learning (RL) approach combined with Graph Neural Networks (GNN) to address the contraction ordering problem.
The problem is extremely challenging due to the huge search space, the heavy-tailed reward distribution, and the challenging credit assignment.
We show how a carefully implemented RL-agent that uses a GNN as the basic policy construct can address these challenges.
arXiv Detail & Related papers (2022-04-18T21:45:13Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - One Model for All Quantization: A Quantized Network Supporting Hot-Swap
Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths.
We use wavelet decomposition and reconstruction to increase the diversity of weights.
Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z) - Optimal Gradient Quantization Condition for Communication-Efficient
Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications.
In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution.
Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.