Low-bit Quantization for Deep Graph Neural Networks with
Smoothness-aware Message Propagation
- URL: http://arxiv.org/abs/2308.14949v1
- Date: Tue, 29 Aug 2023 00:25:02 GMT
- Title: Low-bit Quantization for Deep Graph Neural Networks with
Smoothness-aware Message Propagation
- Authors: Shuang Wang, Bahaeddin Eravci, Rustam Guliyev, Hakan Ferhatosmanoglu
- Abstract summary: We present an end-to-end solution that aims to address these challenges for efficient GNNs in resource constrained environments.
We introduce a quantization based approach for all stages of GNNs, from message passing in training to node classification.
The proposed quantizer learns quantization ranges and reduces the model size with comparable accuracy even under low-bit quantization.
- Score: 3.9177379733188715
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Graph Neural Network (GNN) training and inference involve significant
challenges of scalability with respect to both model sizes and number of
layers, resulting in degradation of efficiency and accuracy for large and deep
GNNs. We present an end-to-end solution that aims to address these challenges
for efficient GNNs in resource constrained environments while avoiding the
oversmoothing problem in deep GNNs. We introduce a quantization based approach
for all stages of GNNs, from message passing in training to node
classification, compressing the model and enabling efficient processing. The
proposed GNN quantizer learns quantization ranges and reduces the model size
with comparable accuracy even under low-bit quantization. To scale with the
number of layers, we devise a message propagation mechanism in training that
controls layer-wise changes of similarities between neighboring nodes. This
objective is incorporated into a Lagrangian function with constraints and a
differential multiplier method is utilized to iteratively find optimal
embeddings. This mitigates oversmoothing and suppresses the quantization error
to a bound. Significant improvements are demonstrated over state-of-the-art
quantization methods and deep GNN approaches in both full-precision and
quantized models. The proposed quantizer demonstrates superior performance in
INT2 configurations across all stages of GNN, achieving a notable level of
accuracy. In contrast, existing quantization approaches fail to generate
satisfactory accuracy levels. Finally, the inference with INT2 and INT4
representations exhibits a speedup of 5.11 $\times$ and 4.70 $\times$ compared
to full precision counterparts, respectively.
Related papers
- Gradient-based Automatic Mixed Precision Quantization for Neural Networks On-Chip [0.9187138676564589]
We present High Granularity Quantization (HGQ), an innovative quantization-aware training method.
HGQ fine-tune the per-weight and per-activation precision by making them optimizable through gradient descent.
This approach enables ultra-low latency and low power neural networks on hardware capable of performing arithmetic operations.
arXiv Detail & Related papers (2024-05-01T17:18:46Z) - $\rm A^2Q$: Aggregation-Aware Quantization for Graph Neural Networks [18.772128348519566]
We propose the Aggregation-Aware mixed-precision Quantization ($rm A2Q$) for Graph Neural Networks (GNNs)
Our method can achieve up to 11.4% and 9.5% accuracy improvements on the node-level and graph-level tasks, respectively, and up to 2x speedup on a dedicated hardware accelerator.
arXiv Detail & Related papers (2023-02-01T02:54:35Z) - Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural
Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs.
Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs.
Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z) - Edge Inference with Fully Differentiable Quantized Mixed Precision
Neural Networks [1.131071436917293]
Quantizing parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference.
This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing.
arXiv Detail & Related papers (2022-06-15T18:11:37Z) - VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using
Vector Quantization [70.8567058758375]
VQ-GNN is a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance.
Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix.
arXiv Detail & Related papers (2021-10-27T11:48:50Z) - A Biased Graph Neural Network Sampler with Near-Optimal Regret [57.70126763759996]
Graph neural networks (GNN) have emerged as a vehicle for applying deep network architectures to graph and relational data.
In this paper, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem.
We introduce a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded payouts.
arXiv Detail & Related papers (2021-03-01T15:55:58Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - SGQuant: Squeezing the Last Bit on Graph Neural Networks with
Specialized Quantization [16.09107108787834]
We propose a specialized GNN quantization scheme, SGQuant, to systematically reduce the GNN memory consumption.
We show that SGQuant can effectively reduce the memory footprint from 4.25x to 31.9x compared with the original full-precision GNNs.
arXiv Detail & Related papers (2020-07-09T22:42:34Z) - AUSN: Approximately Uniform Quantization by Adaptively Superimposing
Non-uniform Distribution for Deep Neural Networks [0.7378164273177589]
Existing uniform and non-uniform quantization methods exhibit an inherent conflict between the representing range and representing resolution.
We propose a novel quantization method to quantize the weight and activation.
The key idea is to Approximate the Uniform quantization by Adaptively Superposing multiple Non-uniform quantized values, namely AUSN.
arXiv Detail & Related papers (2020-07-08T05:10:53Z) - Optimization and Generalization Analysis of Transduction through
Gradient Boosting and Application to Multi-scale Graph Neural Networks [60.22494363676747]
It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing.
Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem.
We derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs.
arXiv Detail & Related papers (2020-06-15T17:06:17Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.