Accelerating Large Scale Real-Time GNN Inference using Channel Pruning
- URL: http://arxiv.org/abs/2105.04528v1
- Date: Mon, 10 May 2021 17:28:44 GMT
- Title: Accelerating Large Scale Real-Time GNN Inference using Channel Pruning
- Authors: Hongkuan Zhou and Ajitesh Srivastava and Hanqing Zeng and Rajgopal
Kannan and Viktor Prasanna
- Abstract summary: Graph Neural Networks (GNNs) are proven to be powerful models to generate node embedding for downstream applications.
However, due to the high computation complexity of GNN inference, it is hard to deploy GNNs for large-scale or real-time applications.
We propose to accelerate GNN inference by pruning the dimensions in each layer with negligible accuracy loss.
- Score: 7.8799581908375185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph Neural Networks (GNNs) are proven to be powerful models to generate
node embedding for downstream applications. However, due to the high
computation complexity of GNN inference, it is hard to deploy GNNs for
large-scale or real-time applications. In this paper, we propose to accelerate
GNN inference by pruning the dimensions in each layer with negligible accuracy
loss. Our pruning framework uses a novel LASSO regression formulation for GNNs
to identify feature dimensions (channels) that have high influence on the
output activation. We identify two inference scenarios and design pruning
schemes based on their computation and memory usage for each. To further reduce
the inference complexity, we effectively store and reuse hidden features of
visited nodes, which significantly reduces the number of supporting nodes
needed to compute the target embedding. We evaluate the proposed method with
the node classification problem on five popular datasets and a real-time spam
detection application. We demonstrate that the pruned GNN models greatly reduce
computation and memory usage with little accuracy loss. For full inference, the
proposed method achieves an average of 3.27x speedup with only 0.002 drop in
F1-Micro on GPU. For batched inference, the proposed method achieves an average
of 6.67x speedup with only 0.003 drop in F1-Micro on CPU. To the best of our
knowledge, we are the first to accelerate large scale real-time GNN inference
through channel pruning.
Related papers
- Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - Graph Neural Network for Accurate and Low-complexity SAR ATR [2.9766397696234996]
We propose a graph neural network (GNN) model to achieve accurate and low-latency SAR ATR.
The proposed GNN model has low computation complexity and achieves comparable high accuracy.
Compared with the state-of-the-art CNNs, the proposed GNN model has only 1/3000 computation cost and 1/80 model size.
arXiv Detail & Related papers (2023-05-11T20:17:41Z) - $\rm A^2Q$: Aggregation-Aware Quantization for Graph Neural Networks [18.772128348519566]
We propose the Aggregation-Aware mixed-precision Quantization ($rm A2Q$) for Graph Neural Networks (GNNs)
Our method can achieve up to 11.4% and 9.5% accuracy improvements on the node-level and graph-level tasks, respectively, and up to 2x speedup on a dedicated hardware accelerator.
arXiv Detail & Related papers (2023-02-01T02:54:35Z) - Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural
Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs.
Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs.
Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z) - FlowGNN: A Dataflow Architecture for Universal Graph Neural Network
Inference via Multi-Queue Streaming [1.566528527065232]
Graph neural networks (GNNs) have recently exploded in popularity thanks to their broad applicability to graph-related problems.
Meeting demand for novel GNN models and fast inference simultaneously is challenging because of the gap between developing efficient accelerators and the rapid creation of new GNN models.
We propose a generic dataflow architecture for GNN acceleration, named FlowGNN, which can flexibly support the majority of message-passing GNNs.
arXiv Detail & Related papers (2022-04-27T17:59:25Z) - VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using
Vector Quantization [70.8567058758375]
VQ-GNN is a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance.
Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix.
arXiv Detail & Related papers (2021-10-27T11:48:50Z) - Deep Graph Neural Networks with Shallow Subgraph Samplers [22.526363992743278]
We propose a simple "deep GNN, shallow sampler" design principle to improve both the GNN accuracy and efficiency.
A properly sampled subgraph may exclude irrelevant or even noisy nodes, and still preserve the critical neighbor features and graph structures.
On the largest public graph dataset, ogbn-papers100M, we achieve state-of-the-art accuracy with an order of magnitude reduction in hardware cost.
arXiv Detail & Related papers (2020-12-02T18:23:48Z) - Graph Neural Network for Large-Scale Network Localization [35.29322617956428]
Graph neural networks (GNNs) are popular to use for classifying structured data in the context of machine learning.
In this work, we adopt GNN for a classic but challenging nonlinear regression problem, namely the network localization.
Our main findings are in order. First, GNN is potentially the best solution to large-scale network localization in terms of accuracy, robustness and computational time.
arXiv Detail & Related papers (2020-10-22T12:39:26Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z) - Fast Learning of Graph Neural Networks with Guaranteed Generalizability:
One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice.
We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z) - Fast Graph Attention Networks Using Effective Resistance Based Graph
Sparsification [70.50751397870972]
FastGAT is a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph.
We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks.
arXiv Detail & Related papers (2020-06-15T22:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.