Degree-Quant: Quantization-Aware Training for Graph Neural Networks
- URL: http://arxiv.org/abs/2008.05000v3
- Date: Mon, 15 Mar 2021 15:27:59 GMT
- Title: Degree-Quant: Quantization-Aware Training for Graph Neural Networks
- Authors: Shyam A. Tailor, Javier Fernandez-Marques, Nicholas D. Lane
- Abstract summary: Graph neural networks (GNNs) have demonstrated strong performance on a wide variety of tasks.
Despite their promise, there exists little research exploring methods to make them more efficient at inference time.
We propose an architecturally-agnostic method, Degree-Quant, to improve performance over existing quantization-aware training baselines.
- Score: 10.330195866109312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph neural networks (GNNs) have demonstrated strong performance on a wide
variety of tasks due to their ability to model non-uniform structured data.
Despite their promise, there exists little research exploring methods to make
them more efficient at inference time. In this work, we explore the viability
of training quantized GNNs, enabling the usage of low precision integer
arithmetic during inference. We identify the sources of error that uniquely
arise when attempting to quantize GNNs, and propose an architecturally-agnostic
method, Degree-Quant, to improve performance over existing quantization-aware
training baselines commonly used on other architectures, such as CNNs. We
validate our method on six datasets and show, unlike previous attempts, that
models generalize to unseen graphs. Models trained with Degree-Quant for INT8
quantization perform as well as FP32 models in most cases; for INT4 models, we
obtain up to 26% gains over the baselines. Our work enables up to 4.7x speedups
on CPU when using INT8 arithmetic.
Related papers
- Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation [3.4606942690643336]
We introduce a novel approach for DNN quantization that uses a redundant representation of DNN's output.
We demonstrate that this mapping can reduce quantization error.
Our approach can be applied to other tasks, including segmentation, object detection, and key-points prediction.
arXiv Detail & Related papers (2024-05-22T21:59:46Z) - Efficient Post-training Quantization with FP8 Formats [14.543387418837154]
We study the advantages of FP8 data formats for post-training quantization across 75 unique network architectures.
E4M3 is better suited for NLP models, whereas E3M4 performs marginally better than E4M3 on computer vision tasks.
arXiv Detail & Related papers (2023-09-26T00:58:36Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - GHN-Q: Parameter Prediction for Unseen Quantized Convolutional
Architectures via Graph Hypernetworks [80.29667394618625]
We conduct the first-ever study exploring the use of graph hypernetworks for predicting parameters of unseen quantized CNN architectures.
We focus on a reduced CNN search space and find that GHN-Q can in fact predict quantization-robust parameters for various 8-bit quantized CNNs.
arXiv Detail & Related papers (2022-08-26T08:00:02Z) - FxP-QNet: A Post-Training Quantizer for the Design of Mixed
Low-Precision DNNs with Dynamic Fixed-Point Representation [2.4149105714758545]
We propose a novel framework referred to as the Fixed-Point Quantizer of deep neural Networks (FxP-QNet)
FxP-QNet adapts the quantization level for each data-structure of each layer based on the trade-off between the network accuracy and the low-precision requirements.
Results show that FxP-QNet-quantized AlexNet, VGG-16, and ResNet-18 reduce the overall memory requirements of their full-precision counterparts by 7.16x, 10.36x, and 6.44x with less than 0.95%, 0.95%, and 1.99%
arXiv Detail & Related papers (2022-03-22T23:01:43Z) - Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech
Recognition [65.7040645560855]
We propose Q-ASR, an integer-only, zero-shot quantization scheme for ASR models.
We show negligible WER change as compared to the full-precision baseline models.
Q-ASR exhibits a large compression rate of more than 4x with small WER degradation.
arXiv Detail & Related papers (2021-03-31T06:05:40Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z) - Learned Low Precision Graph Neural Networks [10.269500440688306]
We show how to systematically quantise Deep Graph Neural Networks (GNNs) with minimal or no loss in performance using Network Architecture Search (NAS)
The proposed novel NAS mechanism, named Low Precision Graph NAS (LPGNAS), constrains both architecture and quantisation choices to be differentiable.
On eight different datasets, solving the task of classifying unseen nodes in a graph, LPGNAS generates quantised models with significant reductions in both model and buffer sizes.
arXiv Detail & Related papers (2020-09-19T13:51:09Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.