Related papers: Accelerating Large Scale Real-Time GNN Inference using Channel Pruning

Accelerating Large Scale Real-Time GNN Inference using Channel Pruning

URL: http://arxiv.org/abs/2105.04528v1
Date: Mon, 10 May 2021 17:28:44 GMT
Title: Accelerating Large Scale Real-Time GNN Inference using Channel Pruning
Authors: Hongkuan Zhou and Ajitesh Srivastava and Hanqing Zeng and Rajgopal Kannan and Viktor Prasanna
Abstract summary: Graph Neural Networks (GNNs) are proven to be powerful models to generate node embedding for downstream applications. However, due to the high computation complexity of GNN inference, it is hard to deploy GNNs for large-scale or real-time applications. We propose to accelerate GNN inference by pruning the dimensions in each layer with negligible accuracy loss.
Score: 7.8799581908375185
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph Neural Networks (GNNs) are proven to be powerful models to generate node embedding for downstream applications. However, due to the high computation complexity of GNN inference, it is hard to deploy GNNs for large-scale or real-time applications. In this paper, we propose to accelerate GNN inference by pruning the dimensions in each layer with negligible accuracy loss. Our pruning framework uses a novel LASSO regression formulation for GNNs to identify feature dimensions (channels) that have high influence on the output activation. We identify two inference scenarios and design pruning schemes based on their computation and memory usage for each. To further reduce the inference complexity, we effectively store and reuse hidden features of visited nodes, which significantly reduces the number of supporting nodes needed to compute the target embedding. We evaluate the proposed method with the node classification problem on five popular datasets and a real-time spam detection application. We demonstrate that the pruned GNN models greatly reduce computation and memory usage with little accuracy loss. For full inference, the proposed method achieves an average of 3.27x speedup with only 0.002 drop in F1-Micro on GPU. For batched inference, the proposed method achieves an average of 6.67x speedup with only 0.003 drop in F1-Micro on CPU. To the best of our knowledge, we are the first to accelerate large scale real-time GNN inference through channel pruning.

Related papers

FIT-GNN: Faster Inference Time for GNNs Using Coarsening [1.323700980948722]
coarsening-based methods are used to reduce the graph into a smaller graph, resulting in faster computation. Prior research has not adequately addressed the computational costs during the inference phase. This paper presents a novel approach to improve the scalability of GNNs by reducing computational burden during both training and inference phases.
arXiv Detail & Related papers (2024-10-19T06:27:24Z)
Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z)
Graph Neural Network for Accurate and Low-complexity SAR ATR [2.9766397696234996]
We propose a graph neural network (GNN) model to achieve accurate and low-latency SAR ATR. The proposed GNN model has low computation complexity and achieves comparable high accuracy. Compared with the state-of-the-art CNNs, the proposed GNN model has only 1/3000 computation cost and 1/80 model size.
arXiv Detail & Related papers (2023-05-11T20:17:41Z)
$\rm A^2Q$: Aggregation-Aware Quantization for Graph Neural Networks [18.772128348519566]
We propose the Aggregation-Aware mixed-precision Quantization ($rm A2Q$) for Graph Neural Networks (GNNs) Our method can achieve up to 11.4% and 9.5% accuracy improvements on the node-level and graph-level tasks, respectively, and up to 2x speedup on a dedicated hardware accelerator.
arXiv Detail & Related papers (2023-02-01T02:54:35Z)
Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs. Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs. Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z)
FlowGNN: A Dataflow Architecture for Universal Graph Neural Network Inference via Multi-Queue Streaming [1.566528527065232]
Graph neural networks (GNNs) have recently exploded in popularity thanks to their broad applicability to graph-related problems. Meeting demand for novel GNN models and fast inference simultaneously is challenging because of the gap between developing efficient accelerators and the rapid creation of new GNN models. We propose a generic dataflow architecture for GNN acceleration, named FlowGNN, which can flexibly support the majority of message-passing GNNs.
arXiv Detail & Related papers (2022-04-27T17:59:25Z)
VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization [70.8567058758375]
VQ-GNN is a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix.
arXiv Detail & Related papers (2021-10-27T11:48:50Z)
Deep Graph Neural Networks with Shallow Subgraph Samplers [22.526363992743278]
We propose a simple "deep GNN, shallow sampler" design principle to improve both the GNN accuracy and efficiency. A properly sampled subgraph may exclude irrelevant or even noisy nodes, and still preserve the critical neighbor features and graph structures. On the largest public graph dataset, ogbn-papers100M, we achieve state-of-the-art accuracy with an order of magnitude reduction in hardware cost.
arXiv Detail & Related papers (2020-12-02T18:23:48Z)
Graph Neural Network for Large-Scale Network Localization [35.29322617956428]
Graph neural networks (GNNs) are popular to use for classifying structured data in the context of machine learning. In this work, we adopt GNN for a classic but challenging nonlinear regression problem, namely the network localization. Our main findings are in order. First, GNN is potentially the best solution to large-scale network localization in terms of accuracy, robustness and computational time.
arXiv Detail & Related papers (2020-10-22T12:39:26Z)
Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs. In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings. We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)
Fast Learning of Graph Neural Networks with Guaranteed Generalizability: One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice. We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z)
Fast Graph Attention Networks Using Effective Resistance Based Graph Sparsification [70.50751397870972]
FastGAT is a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph. We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks.
arXiv Detail & Related papers (2020-06-15T22:07:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.