Related papers: Automatic low-bit hybrid quantization of neural networks through meta learning

Automatic low-bit hybrid quantization of neural networks through meta learning

URL: http://arxiv.org/abs/2004.11506v1
Date: Fri, 24 Apr 2020 02:01:26 GMT
Title: Automatic low-bit hybrid quantization of neural networks through meta learning
Authors: Tao Wang, Junsong Wang, Chang Xu and Chao Xue
Abstract summary: We employ the meta learning method to automatically realize low-bit hybrid quantization of neural networks. A MetaQuantNet, together with a Quantization function, are trained to generate the quantized weights for the target DNN. With the best searched quantization policy, we subsequently retrain or finetune to further improve the performance of the quantized target network.
Score: 22.81983466720024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference, especially when deploying to edge or IoT devices with limited computation capacity and power consumption budget. The uniform bit width quantization across all the layers is usually sub-optimal and the exploration of hybrid quantization for different layers is vital for efficient deep compression. In this paper, we employ the meta learning method to automatically realize low-bit hybrid quantization of neural networks. A MetaQuantNet, together with a Quantization function, are trained to generate the quantized weights for the target DNN. Then, we apply a genetic algorithm to search the best hybrid quantization policy that meets compression constraints. With the best searched quantization policy, we subsequently retrain or finetune to further improve the performance of the quantized target network. Extensive experiments demonstrate the performance of searched hybrid quantization scheme surpass that of uniform bitwidth counterpart. Compared to the existing reinforcement learning (RL) based hybrid quantization search approach that relies on tedious explorations, our meta learning approach is more efficient and effective for any compression requirements since the MetaQuantNet only needs be trained once.

Related papers

A generic and robust quantum agent inspired by deep meta-reinforcement learning [4.881040823544883]
We develop a new training algorithm inspired by the deep meta-reinforcement learning (deep meta-RL) The trained neural network is adaptive and robust. Our algorithm can also automatically adjust the number of pulses required to generate the target gate.
arXiv Detail & Related papers (2024-06-11T13:04:30Z)
A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity [50.387179833629254]
We introduce a collaborative classical-quantum architecture called co-TenQu. Co-TenQu enhances a classical deep neural network by up to 41.72% in a fair setting. It outperforms other quantum-based methods by up to 1.9 times and achieves similar accuracy while utilizing 70.59% fewer qubits.
arXiv Detail & Related papers (2024-02-23T14:09:41Z)
Quantization Aware Factorization for Deep Neural Network Compression [20.04951101799232]
decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. A conventional post-training quantization approach applied to networks with weights yields a drop in accuracy. This motivated us to develop an algorithm that finds decomposed approximation directly with quantized factors.
arXiv Detail & Related papers (2023-08-08T21:38:02Z)
Quantization-aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs) Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization. We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z)
Quantum Robustness Verification: A Hybrid Quantum-Classical Neural Network Certification Algorithm [1.439946676159516]
In this work, we investigate the verification of ReLU networks, which involves solving a robustness many-variable mixed-integer programs (MIPs) To alleviate this issue, we propose to use QC for neural network verification and introduce a hybrid quantum procedure to compute provable certificates. We show that, in a simulated environment, our certificate is sound, and provide bounds on the minimum number of qubits necessary to approximate the problem.
arXiv Detail & Related papers (2022-05-02T13:23:56Z)
Optimizing Tensor Network Contraction Using Reinforcement Learning [86.05566365115729]
We propose a Reinforcement Learning (RL) approach combined with Graph Neural Networks (GNN) to address the contraction ordering problem. The problem is extremely challenging due to the huge search space, the heavy-tailed reward distribution, and the challenging credit assignment. We show how a carefully implemented RL-agent that uses a GNN as the basic policy construct can address these challenges.
arXiv Detail & Related papers (2022-04-18T21:45:13Z)
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM) Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
One Model for All Quantization: A Quantized Network Supporting Hot-Swap Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths. We use wavelet decomposition and reconstruction to increase the diversity of weights. Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z)
Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training [99.42912552638168]
Communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications. In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for textbfANY gradient distribution. Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively.
arXiv Detail & Related papers (2020-02-25T18:28:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.