Related papers: Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks

Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks

URL: http://arxiv.org/abs/2308.11825v1
Date: Tue, 22 Aug 2023 23:12:17 GMT
Title: Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks
Authors: Xi Xie, Hongwu Peng, Amit Hasan, Shaoyi Huang, Jiahui Zhao, Haowen Fang, Wei Zhang, Tong Geng, Omer Khan, and Caiwen Ding
Abstract summary: Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains. We present Accel-GCN, a GPU accelerator architecture for GCNs. Evaluation of Accel-GCN across 18 benchmark graphs reveals that it outperforms cuSPARSE, GNNAdvisor, and graph-BLAST by factors of 1.17 times, 1.86 times, and 2.94 times respectively.
Score: 12.181052673940465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains, yet their acceleration on mainstream GPUs is challenged by workload imbalance and memory access irregularity. To address these challenges, we present Accel-GCN, a GPU accelerator architecture for GCNs. The design of Accel-GCN encompasses: (i) a lightweight degree sorting stage to group nodes with similar degree; (ii) a block-level partition strategy that dynamically adjusts warp workload sizes, enhancing shared memory locality and workload balance, and reducing metadata overhead compared to designs like GNNAdvisor; (iii) a combined warp strategy that improves memory coalescing and computational parallelism in the column dimension of dense matrices. Utilizing these principles, we formulated a kernel for sparse matrix multiplication (SpMM) in GCNs that employs block-level partitioning and combined warp strategy. This approach augments performance and multi-level memory efficiency and optimizes memory bandwidth by exploiting memory coalescing and alignment. Evaluation of Accel-GCN across 18 benchmark graphs reveals that it outperforms cuSPARSE, GNNAdvisor, and graph-BLAST by factors of 1.17 times, 1.86 times, and 2.94 times respectively. The results underscore Accel-GCN as an effective solution for enhancing GCN computational efficiency.

Related papers

AIRES: Accelerating Out-of-Core GCNs via Algorithm-System Co-Design [6.554916179445241]
Graph convolutional networks (GCNs) are fundamental in various scientific applications, ranging from biomedical protein-protein interactions (PPI) to large-scale recommendation systems.<n>An essential component for modeling graph structures in GCNs is sparse general matrix-matrix multiplication (SpGEMM)<n>SpGEMMs are often conducted in an out-of-core fashion due to limited GPU memory space in resource-constrained systems.<n>We propose AIRES, a novel algorithm-system co-design solution to accelerate out-of-core SpGEMM computation for GCNs.
arXiv Detail & Related papers (2025-07-02T00:35:43Z)
Efficient Message Passing Architecture for GCN Training on HBM-based FPGAs with Orthogonal Topology On-Chip Networks [0.0]
Graph Convolutional Networks (GCNs) are state-of-the-art deep learning models for representation learning on graphs. We propose a message-passing architecture that leverages NUMA-based memory access properties. We also re-engineered the backpropagation algorithm specific to GCNs within our proposed accelerator.
arXiv Detail & Related papers (2024-11-06T12:00:51Z)
Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition [56.26113670151363]
Graph condensation is a data-centric solution to replace the large graph with a small yet informative condensed graph. Existing GC methods suffer from intricate optimization processes, necessitating excessive computing resources. We propose a training-free GC framework termed Class-partitioned Graph Condensation (CGC) CGC achieves state-of-the-art performance with a more efficient condensation process.
arXiv Detail & Related papers (2024-05-22T14:57:09Z)
Cached Operator Reordering: A Unified View for Fast GNN Training [24.917363701638607]
Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering. However, the sparse nature of GNN computation poses new challenges for performance optimization compared to traditional deep neural networks. We address these challenges by providing a unified view of GNN computation, I/O, and memory.
arXiv Detail & Related papers (2023-08-23T12:27:55Z)
GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design [16.714146304920693]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model. It can be notoriously challenging to inference GCNs over large graph datasets. This paper proposes a GCN algorithm and accelerator Co-Design framework dubbed GCoD which can largely alleviate the aforementioned GCN irregularity.
arXiv Detail & Related papers (2021-12-22T00:30:50Z)
SPA-GCN: Efficient and Flexible GCN Accelerator with an Application for Graph Similarity Computation [7.54579279348595]
We propose a flexible architecture called SPA-GCN for accelerating Graph Convolutional Networks (GCN) on graphs. We show that SPA-GCN can deliver a high speedup compared to a multi-core CPU implementation and a GPU implementation.
arXiv Detail & Related papers (2021-11-10T20:47:57Z)
GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific Caching [2.654276707313136]
GNNIE is an accelerator designed to run a broad range of Graph Neural Networks (GNNs) It tackles workload imbalance by (i) splitting node feature operands into blocks, (ii) reordering and redistributing computations, and (iii) using a flexible MAC architecture with low communication overheads among the processing elements. GNNIE achieves average speedups of over 8890x over a CPU and 295x over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool.
arXiv Detail & Related papers (2021-05-21T20:07:14Z)
VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator. textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z)
DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters. Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z)
Towards Efficient Graph Convolutional Networks for Point Cloud Handling [181.59146413326056]
We aim at improving the computational efficiency of graph convolutional networks (GCNs) for learning on point clouds. A series of experiments show that optimized networks have reduced computational complexity, decreased memory consumption, and accelerated inference speed.
arXiv Detail & Related papers (2021-04-12T17:59:16Z)
Bi-GCN: Binary Graph Convolutional Network [57.733849700089955]
We propose a Binary Graph Convolutional Network (Bi-GCN), which binarizes both the network parameters and input node features. Our Bi-GCN can reduce the memory consumption by an average of 30x for both the network parameters and input data, and accelerate the inference speed by an average of 47x.
arXiv Detail & Related papers (2020-10-15T07:26:23Z)
Fast Graph Attention Networks Using Effective Resistance Based Graph Sparsification [70.50751397870972]
FastGAT is a method to make attention based GNNs lightweight by using spectral sparsification to generate an optimal pruning of the input graph. We experimentally evaluate FastGAT on several large real world graph datasets for node classification tasks.
arXiv Detail & Related papers (2020-06-15T22:07:54Z)
Graph Highway Networks [77.38665506495553]
Graph Convolution Networks (GCN) are widely used in learning graph representations due to their effectiveness and efficiency. They suffer from the notorious over-smoothing problem, in which the learned representations converge to alike vectors when many layers are stacked. We propose Graph Highway Networks (GHNet) which utilize gating units to balance the trade-off between homogeneity and heterogeneity in the GCN learning process.
arXiv Detail & Related papers (2020-04-09T16:26:43Z)
L$^2$-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks [118.37805042816784]
Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets. We propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training. Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size.
arXiv Detail & Related papers (2020-03-30T16:37:56Z)
GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms [1.2183405753834562]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. It is challenging to accelerate training of GCNs due to substantial and irregular data communication. We design a novel accelerator for training GCNs on CPU-FPGA heterogeneous systems.
arXiv Detail & Related papers (2019-12-31T21:19:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.