I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality
Enhancement through Islandization
- URL: http://arxiv.org/abs/2203.03606v1
- Date: Mon, 7 Mar 2022 18:56:40 GMT
- Title: I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality
Enhancement through Islandization
- Authors: Tong Geng, Chunshu Wu, Yongan Zhang, Cheng Tan, Chenhao Xie, Haoran
You, Martin C. Herbordt, Yingyan Lin, Ang Li
- Abstract summary: Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years.
High-performance hardware acceleration of GCNs is as critical but even more challenging.
We propose a novel hardware accelerator for GCN inference, called I-GCN, that significantly improves data locality and reduces unnecessary computation.
- Score: 29.070089261016832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph Convolutional Networks (GCNs) have drawn tremendous attention in the
past three years. Compared with other deep learning modalities,
high-performance hardware acceleration of GCNs is as critical but even more
challenging. The hurdles arise from the poor data locality and redundant
computation due to the large size, high sparsity, and irregular non-zero
distribution of real-world graphs.
In this paper we propose a novel hardware accelerator for GCN inference,
called I-GCN, that significantly improves data locality and reduces unnecessary
computation. The mechanism is a new online graph restructuring algorithm we
refer to as islandization. The proposed algorithm finds clusters of nodes with
strong internal but weak external connections. The islandization process yields
two major benefits. First, by processing islands rather than individual nodes,
there is better on-chip data reuse and fewer off-chip memory accesses. Second,
there is less redundant computation as aggregation for common/shared neighbors
in an island can be reused. The parallel search, identification, and leverage
of graph islands are all handled purely in hardware at runtime working in an
incremental pipeline. This is done without any preprocessing of the graph data
or adjustment of the GCN model structure.
Experimental results show that I-GCN can significantly reduce off-chip
accesses and prune 38% of aggregation operations, leading to performance
speedups over CPUs, GPUs, the prior art GCN accelerators of 5549x, 403x, and
5.7x on average, respectively.
Related papers
- Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution
Networks [12.181052673940465]
Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains.
We present Accel-GCN, a GPU accelerator architecture for GCNs.
Evaluation of Accel-GCN across 18 benchmark graphs reveals that it outperforms cuSPARSE, GNNAdvisor, and graph-BLAST by factors of 1.17 times, 1.86 times, and 2.94 times respectively.
arXiv Detail & Related papers (2023-08-22T23:12:17Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - Scalable Graph Convolutional Network Training on Distributed-Memory
Systems [5.169989177779801]
Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs.
Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges.
We propose a highly parallel training algorithm that scales to large processor counts.
arXiv Detail & Related papers (2022-12-09T17:51:13Z) - GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm
and Accelerator Co-Design [27.311994997480745]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model.
It can be notoriously challenging to inference GCNs over large graph datasets.
This paper proposes a GCN algorithm and accelerator Co-Design framework dubbed GCoD which can largely alleviate the aforementioned GCN irregularity.
arXiv Detail & Related papers (2021-12-22T00:30:50Z) - GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific
Caching [2.654276707313136]
GNNIE is an accelerator designed to run a broad range of Graph Neural Networks (GNNs)
It tackles workload imbalance by (i) splitting node feature operands into blocks, (ii) reordering and redistributing computations, and (iii) using a flexible MAC architecture with low communication overheads among the processing elements.
GNNIE achieves average speedups of over 8890x over a CPU and 295x over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool.
arXiv Detail & Related papers (2021-05-21T20:07:14Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - Towards Efficient Graph Convolutional Networks for Point Cloud Handling [181.59146413326056]
We aim at improving the computational efficiency of graph convolutional networks (GCNs) for learning on point clouds.
A series of experiments show that optimized networks have reduced computational complexity, decreased memory consumption, and accelerated inference speed.
arXiv Detail & Related papers (2021-04-12T17:59:16Z) - Graph Highway Networks [77.38665506495553]
Graph Convolution Networks (GCN) are widely used in learning graph representations due to their effectiveness and efficiency.
They suffer from the notorious over-smoothing problem, in which the learned representations converge to alike vectors when many layers are stacked.
We propose Graph Highway Networks (GHNet) which utilize gating units to balance the trade-off between homogeneity and heterogeneity in the GCN learning process.
arXiv Detail & Related papers (2020-04-09T16:26:43Z) - L$^2$-GCN: Layer-Wise and Learned Efficient Training of Graph
Convolutional Networks [118.37805042816784]
Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets.
We propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training.
Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size.
arXiv Detail & Related papers (2020-03-30T16:37:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.