Related papers: I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

URL: http://arxiv.org/abs/2203.03606v1
Date: Mon, 7 Mar 2022 18:56:40 GMT
Title: I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization
Authors: Tong Geng, Chunshu Wu, Yongan Zhang, Cheng Tan, Chenhao Xie, Haoran You, Martin C. Herbordt, Yingyan Lin, Ang Li
Abstract summary: Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years. High-performance hardware acceleration of GCNs is as critical but even more challenging. We propose a novel hardware accelerator for GCN inference, called I-GCN, that significantly improves data locality and reduces unnecessary computation.
Score: 29.070089261016832
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years. Compared with other deep learning modalities, high-performance hardware acceleration of GCNs is as critical but even more challenging. The hurdles arise from the poor data locality and redundant computation due to the large size, high sparsity, and irregular non-zero distribution of real-world graphs. In this paper we propose a novel hardware accelerator for GCN inference, called I-GCN, that significantly improves data locality and reduces unnecessary computation. The mechanism is a new online graph restructuring algorithm we refer to as islandization. The proposed algorithm finds clusters of nodes with strong internal but weak external connections. The islandization process yields two major benefits. First, by processing islands rather than individual nodes, there is better on-chip data reuse and fewer off-chip memory accesses. Second, there is less redundant computation as aggregation for common/shared neighbors in an island can be reused. The parallel search, identification, and leverage of graph islands are all handled purely in hardware at runtime working in an incremental pipeline. This is done without any preprocessing of the graph data or adjustment of the GCN model structure. Experimental results show that I-GCN can significantly reduce off-chip accesses and prune 38% of aggregation operations, leading to performance speedups over CPUs, GPUs, the prior art GCN accelerators of 5549x, 403x, and 5.7x on average, respectively.

Related papers

OMEGA: A Low-Latency GNN Serving System for Large Graphs [8.51634655687174]
Graph Neural Networks (GNNs) have been widely adopted for their ability to compute expressive node representations in graph datasets. Existing approximation techniques in training can mitigate the overheads but, in serving, still lead to high latency and/or accuracy loss. We propose OMEGA, a system that enables low-latency GNN serving for large graphs with minimal accuracy loss.
arXiv Detail & Related papers (2025-01-15T03:14:18Z)
Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints. A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z)
Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks [12.181052673940465]
Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains. We present Accel-GCN, a GPU accelerator architecture for GCNs. Evaluation of Accel-GCN across 18 benchmark graphs reveals that it outperforms cuSPARSE, GNNAdvisor, and graph-BLAST by factors of 1.17 times, 1.86 times, and 2.94 times respectively.
arXiv Detail & Related papers (2023-08-22T23:12:17Z)
INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient. We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z)
Scalable Graph Convolutional Network Training on Distributed-Memory Systems [5.169989177779801]
Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs. Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges. We propose a highly parallel training algorithm that scales to large processor counts.
arXiv Detail & Related papers (2022-12-09T17:51:13Z)
GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design [27.311994997480745]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model. It can be notoriously challenging to inference GCNs over large graph datasets. This paper proposes a GCN algorithm and accelerator Co-Design framework dubbed GCoD which can largely alleviate the aforementioned GCN irregularity.
arXiv Detail & Related papers (2021-12-22T00:30:50Z)
GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific Caching [2.654276707313136]
GNNIE is an accelerator designed to run a broad range of Graph Neural Networks (GNNs) It tackles workload imbalance by (i) splitting node feature operands into blocks, (ii) reordering and redistributing computations, and (iii) using a flexible MAC architecture with low communication overheads among the processing elements. GNNIE achieves average speedups of over 8890x over a CPU and 295x over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool.
arXiv Detail & Related papers (2021-05-21T20:07:14Z)
DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters. Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z)
Towards Efficient Graph Convolutional Networks for Point Cloud Handling [181.59146413326056]
We aim at improving the computational efficiency of graph convolutional networks (GCNs) for learning on point clouds. A series of experiments show that optimized networks have reduced computational complexity, decreased memory consumption, and accelerated inference speed.
arXiv Detail & Related papers (2021-04-12T17:59:16Z)
Graph Highway Networks [77.38665506495553]
Graph Convolution Networks (GCN) are widely used in learning graph representations due to their effectiveness and efficiency. They suffer from the notorious over-smoothing problem, in which the learned representations converge to alike vectors when many layers are stacked. We propose Graph Highway Networks (GHNet) which utilize gating units to balance the trade-off between homogeneity and heterogeneity in the GCN learning process.
arXiv Detail & Related papers (2020-04-09T16:26:43Z)
L$^2$-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks [118.37805042816784]
Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets. We propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training. Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size.
arXiv Detail & Related papers (2020-03-30T16:37:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.