GHN-QAT: Training Graph Hypernetworks to Predict Quantization-Robust
Parameters of Unseen Limited Precision Neural Networks
- URL: http://arxiv.org/abs/2309.13773v1
- Date: Sun, 24 Sep 2023 23:01:00 GMT
- Title: GHN-QAT: Training Graph Hypernetworks to Predict Quantization-Robust
Parameters of Unseen Limited Precision Neural Networks
- Authors: Stone Yun, Alexander Wong
- Abstract summary: Graph Hypernetworks (GHN) can predict the parameters of varying unseen CNN architectures with surprisingly good accuracy.
Preliminary research has explored the use of GHNs to predict quantization-robust parameters for 8-bit and 4-bit quantized CNNs.
We show that quantization-aware training can significantly improve quantized accuracy for GHN predicted parameters of 4-bit quantized CNNs.
- Score: 80.29667394618625
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Graph Hypernetworks (GHN) can predict the parameters of varying unseen CNN
architectures with surprisingly good accuracy at a fraction of the cost of
iterative optimization. Following these successes, preliminary research has
explored the use of GHNs to predict quantization-robust parameters for 8-bit
and 4-bit quantized CNNs. However, this early work leveraged full-precision
float32 training and only quantized for testing. We explore the impact of
quantization-aware training and/or other quantization-based training strategies
on quantized robustness and performance of GHN predicted parameters for
low-precision CNNs. We show that quantization-aware training can significantly
improve quantized accuracy for GHN predicted parameters of 4-bit quantized CNNs
and even lead to greater-than-random accuracy for 2-bit quantized CNNs. These
promising results open the door for future explorations such as investigating
the use of GHN predicted parameters as initialization for further quantized
training of individual CNNs, further exploration of "extreme bitwidth"
quantization, and mixed precision quantization schemes.
Related papers
- PD-Quant: Post-Training Quantization based on Prediction Difference
Metric [43.81334288840746]
Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types.
How to determine the appropriate quantization parameters is the main problem facing now.
PD-Quant is a method that addresses this limitation by considering global information.
arXiv Detail & Related papers (2022-12-14T05:48:58Z) - GHN-Q: Parameter Prediction for Unseen Quantized Convolutional
Architectures via Graph Hypernetworks [80.29667394618625]
We conduct the first-ever study exploring the use of graph hypernetworks for predicting parameters of unseen quantized CNN architectures.
We focus on a reduced CNN search space and find that GHN-Q can in fact predict quantization-robust parameters for various 8-bit quantized CNNs.
arXiv Detail & Related papers (2022-08-26T08:00:02Z) - Quantune: Post-training Quantization of Convolutional Neural Networks
using Extreme Gradient Boosting for Fast Deployment [15.720551497037176]
We propose an auto-tuner known as Quantune to accelerate the search for the configurations of quantization.
We show that Quantune reduces the search time for quantization by approximately 36.5x with an accuracy loss of 0.07 0.65% across six CNN models.
arXiv Detail & Related papers (2022-02-10T14:05:02Z) - Fixed-point Quantization of Convolutional Neural Networks for Quantized
Inference on Embedded Platforms [0.9954382983583577]
We propose a method to optimally quantize the weights, biases and activations of each layer of a pre-trained CNN.
We find that layer-wise quantization of parameters significantly helps in this process.
arXiv Detail & Related papers (2021-02-03T17:05:55Z) - Where Should We Begin? A Low-Level Exploration of Weight Initialization
Impact on Quantized Behaviour of Deep Neural Networks [93.4221402881609]
We present an in-depth, fine-grained ablation study of the effect of different weights initialization on the final distributions of weights and activations of different CNN architectures.
To our best knowledge, we are the first to perform such a low-level, in-depth quantitative analysis of weights initialization and its effect on quantized behaviour.
arXiv Detail & Related papers (2020-11-30T06:54:28Z) - Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators.
We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z) - APQ: Joint Search for Network Architecture, Pruning and Quantization
Policy [49.3037538647714]
We present APQ for efficient deep learning inference on resource-constrained hardware.
Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner.
With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ.
arXiv Detail & Related papers (2020-06-15T16:09:17Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.