HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs
- URL: http://arxiv.org/abs/2007.09952v1
- Date: Mon, 20 Jul 2020 09:02:09 GMT
- Title: HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs
- Authors: Hai Victor Habi, Roy H. Jennings, Arnon Netzer
- Abstract summary: We introduce the Hardware Friendly Mixed Precision Quantization Block (HMQ)
HMQ is a mixed precision quantization block that repurposes the Gumbel-Softmax estimator into a smooth estimator of a pair of quantization parameters.
We apply HMQs to quantize classification models trained on CIFAR10 and ImageNet.
- Score: 7.219077740523684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work in network quantization produced state-of-the-art results using
mixed precision quantization. An imperative requirement for many efficient edge
device hardware implementations is that their quantizers are uniform and with
power-of-two thresholds. In this work, we introduce the Hardware Friendly Mixed
Precision Quantization Block (HMQ) in order to meet this requirement. The HMQ
is a mixed precision quantization block that repurposes the Gumbel-Softmax
estimator into a smooth estimator of a pair of quantization parameters, namely,
bit-width and threshold. HMQs use this to search over a finite space of
quantization schemes. Empirically, we apply HMQs to quantize classification
models trained on CIFAR10 and ImageNet. For ImageNet, we quantize four
different architectures and show that, in spite of the added restrictions to
our quantization scheme, we achieve competitive and, in some cases,
state-of-the-art results.
Related papers
- Quantum Deep Equilibrium Models [1.5853439776721878]
We present Quantum Deep Equilibrium Models (QDEQ), a training paradigm that learns parameters of a quantum machine learning model.
We find that QDEQ is not only competitive with comparable existing baseline models, but also achieves higher performance than a network with 5 times more layers.
This demonstrates that the QDEQ paradigm can be used to develop significantly more shallow quantum circuits for a given task.
arXiv Detail & Related papers (2024-10-31T13:54:37Z) - Adaptive quantization with mixed-precision based on low-cost proxy [8.527626602939105]
This paper proposes a novel model quantization method, named the Low-Cost Proxy-Based Adaptive Mixed-Precision Model Quantization (LCPAQ)
The hardware-aware module is designed by considering the hardware limitations, while an adaptive mixed-precision quantization module is developed to evaluate the quantization sensitivity.
Experiments on the ImageNet demonstrate that the proposed LCPAQ achieves comparable or superior quantization accuracy to existing mixed-precision models.
arXiv Detail & Related papers (2024-02-27T17:36:01Z) - A Quantum-Classical Collaborative Training Architecture Based on Quantum
State Fidelity [50.387179833629254]
We introduce a collaborative classical-quantum architecture called co-TenQu.
Co-TenQu enhances a classical deep neural network by up to 41.72% in a fair setting.
It outperforms other quantum-based methods by up to 1.9 times and achieves similar accuracy while utilizing 70.59% fewer qubits.
arXiv Detail & Related papers (2024-02-23T14:09:41Z) - MixQuant: Mixed Precision Quantization with a Bit-width Optimization
Search [7.564770908909927]
Quantization is a technique for creating efficient Deep Neural Networks (DNNs)
We propose MixQuant, a search algorithm that finds the optimal custom quantization bit-width for each layer weight based on roundoff error.
We show that combining MixQuant with BRECQ, a state-of-the-art quantization method, yields better quantized model accuracy than BRECQ alone.
arXiv Detail & Related papers (2023-09-29T15:49:54Z) - On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices.
For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator.
For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z) - Scaling Limits of Quantum Repeater Networks [62.75241407271626]
Quantum networks (QNs) are a promising platform for secure communications, enhanced sensing, and efficient distributed quantum computing.
Due to the fragile nature of quantum states, these networks face significant challenges in terms of scalability.
In this paper, the scaling limits of quantum repeater networks (QRNs) are analyzed.
arXiv Detail & Related papers (2023-05-15T14:57:01Z) - Modular Quantization-Aware Training for 6D Object Pose Estimation [52.9436648014338]
Edge applications demand efficient 6D object pose estimation on resource-constrained embedded platforms.
We introduce Modular Quantization-Aware Training (MQAT), an adaptive and mixed-precision quantization-aware training strategy.
MQAT guides a systematic gradated modular quantization sequence and determines module-specific bit precisions, leading to quantized models that outperform those produced by state-of-the-art uniform and mixed-precision quantization techniques.
arXiv Detail & Related papers (2023-03-12T21:01:54Z) - HPTQ: Hardware-Friendly Post Training Quantization [6.515659231669797]
We introduce a hardware-friendly post training quantization (HPTQ) framework.
We perform a large-scale study on four tasks: classification, object detection, semantic segmentation and pose estimation.
Our experiments show that competitive results can be obtained under hardware-friendly constraints.
arXiv Detail & Related papers (2021-09-19T12:45:01Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - One Model for All Quantization: A Quantized Network Supporting Hot-Swap
Bit-Width Adjustment [36.75157407486302]
We propose a method to train a model for all quantization that supports diverse bit-widths.
We use wavelet decomposition and reconstruction to increase the diversity of weights.
Our method can achieve accuracy comparable to dedicated models trained at the same precision.
arXiv Detail & Related papers (2021-05-04T08:10:50Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.