GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific
Caching
- URL: http://arxiv.org/abs/2105.10554v1
- Date: Fri, 21 May 2021 20:07:14 GMT
- Title: GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific
Caching
- Authors: Sudipta Mondal, Susmita Dey Manasi, Kishor Kunal, and Sachin S.
Sapatnekar
- Abstract summary: GNNIE is an accelerator designed to run a broad range of Graph Neural Networks (GNNs)
It tackles workload imbalance by (i) splitting node feature operands into blocks, (ii) reordering and redistributing computations, and (iii) using a flexible MAC architecture with low communication overheads among the processing elements.
GNNIE achieves average speedups of over 8890x over a CPU and 295x over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool.
- Score: 2.654276707313136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analysis engines based on Graph Neural Networks (GNNs) are vital for many
real-world problems that model relationships using large graphs. Challenges for
a GNN hardware platform include the ability to (a) host a variety of GNNs, (b)
handle high sparsity in input node feature vectors and the graph adjacency
matrix and the accompanying random memory access patterns, and (c) maintain
load-balanced computation in the face of uneven workloads induced by high
sparsity and power-law vertex degree distributions in real datasets. The
proposes GNNIE, an accelerator designed to run a broad range of GNNs. It
tackles workload imbalance by (i) splitting node feature operands into blocks,
(ii) reordering and redistributing computations, and (iii) using a flexible MAC
architecture with low communication overheads among the processing elements. In
addition, it adopts a graph partitioning scheme and a graph-specific caching
policy that efficiently uses off-chip memory bandwidth that is well suited to
the characteristics of real-world graphs. Random memory access effects are
mitigated by partitioning and degree-aware caching to enable the reuse of
high-degree vertices. GNNIE achieves average speedups of over 8890x over a CPU
and 295x over a GPU over multiple datasets on graph attention networks (GATs),
graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool, Compared
to prior approaches, GNNIE achieves an average speedup of 9.74x over HyGCN for
GCN, GraphSAGE, and GINConv; HyGCN cannot implement GATs. GNNIE achieves an
average speedup of 2.28x over AWB-GCN (which runs only GCNs), despite using
3.4x fewer processing units.
Related papers
- Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - Cached Operator Reordering: A Unified View for Fast GNN Training [24.917363701638607]
Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering.
However, the sparse nature of GNN computation poses new challenges for performance optimization compared to traditional deep neural networks.
We address these challenges by providing a unified view of GNN computation, I/O, and memory.
arXiv Detail & Related papers (2023-08-23T12:27:55Z) - Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution
Networks [12.181052673940465]
Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains.
We present Accel-GCN, a GPU accelerator architecture for GCNs.
Evaluation of Accel-GCN across 18 benchmark graphs reveals that it outperforms cuSPARSE, GNNAdvisor, and graph-BLAST by factors of 1.17 times, 1.86 times, and 2.94 times respectively.
arXiv Detail & Related papers (2023-08-22T23:12:17Z) - Accelerating Generic Graph Neural Networks via Architecture, Compiler,
Partition Method Co-Design [15.500725014235412]
Graph neural networks (GNNs) have shown significant accuracy improvements in a variety of graph learning domains.
It is essential to develop high-performance and efficient hardware acceleration for GNN models.
Designers face two fundamental challenges: the high bandwidth requirement of GNN models and the diversity of GNN models.
arXiv Detail & Related papers (2023-08-16T07:05:47Z) - GHOST: A Graph Neural Network Accelerator using Silicon Photonics [4.226093500082746]
Graph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data.
We present GHOST, the first silicon-photonic hardware accelerator for GNNs.
arXiv Detail & Related papers (2023-07-04T15:37:20Z) - Communication-Efficient Graph Neural Networks with Probabilistic
Neighborhood Expansion Analysis and Caching [59.8522166385372]
Training and inference with graph neural networks (GNNs) on massive graphs has been actively studied since the inception of GNNs.
This paper is concerned with minibatch training and inference with GNNs that employ node-wise sampling in distributed settings.
We present SALIENT++, which extends the prior state-of-the-art SALIENT system to work with partitioned feature data.
arXiv Detail & Related papers (2023-05-04T21:04:01Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - A Unified Lottery Ticket Hypothesis for Graph Neural Networks [82.31087406264437]
We present a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights.
We further generalize the popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network.
arXiv Detail & Related papers (2021-02-12T21:52:43Z) - Fast Graph Attention Networks Using Effective Resistance Based Graph
Sparsification [70.50751397870972]
FastGAT is a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph.
We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks.
arXiv Detail & Related papers (2020-06-15T22:07:54Z) - Graph Highway Networks [77.38665506495553]
Graph Convolution Networks (GCN) are widely used in learning graph representations due to their effectiveness and efficiency.
They suffer from the notorious over-smoothing problem, in which the learned representations converge to alike vectors when many layers are stacked.
We propose Graph Highway Networks (GHNet) which utilize gating units to balance the trade-off between homogeneity and heterogeneity in the GCN learning process.
arXiv Detail & Related papers (2020-04-09T16:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.