Exploiting On-chip Heterogeneity of Versal Architecture for GNN
Inference Acceleration
- URL: http://arxiv.org/abs/2308.02749v1
- Date: Fri, 4 Aug 2023 23:57:55 GMT
- Title: Exploiting On-chip Heterogeneity of Versal Architecture for GNN
Inference Acceleration
- Authors: Paul Chen, Pavan Manjunath, Sasindu Wijeratne, Bingyi Zhang, Viktor
Prasanna
- Abstract summary: Graph Neural Networks (GNNs) have revolutionized many Machine Learning (ML) applications, such as social network analysis, bioinformatics, etc.
We leverage the heterogeneous computing capabilities of AMD Versal ACAP architecture to accelerate GNN inference.
For Graph Convolutional Network (GCN) inference, our approach leads to a speedup of 3.9-96.7x compared to designs using PL only on the same ACAP device.
- Score: 0.5249805590164902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph Neural Networks (GNNs) have revolutionized many Machine Learning (ML)
applications, such as social network analysis, bioinformatics, etc. GNN
inference can be accelerated by exploiting data sparsity in the input graph,
vertex features, and intermediate data in GNN computations. For dynamic
sparsity exploitation, we leverage the heterogeneous computing capabilities of
AMD Versal ACAP architecture to accelerate GNN inference. We develop a custom
hardware module that executes the sparse primitives of the computation kernel
on the Programmable Logic (PL) and efficiently computes the dense primitives
using the AI Engine (AIE). To exploit data sparsity during inference, we devise
a runtime kernel mapping strategy that dynamically assigns computation tasks to
the PL and AIE based on data sparsity. Our implementation on the VCK5000 ACAP
platform leads to superior performance compared with the state-of-the-art
implementations on CPU, GPU, ACAP, and other custom GNN accelerators. Compared
with these implementations, we achieve significant average runtime speedup
across various models and datasets of 162.42x, 17.01x, 9.90x, and 27.23x,
respectively. Furthermore, for Graph Convolutional Network (GCN) inference, our
approach leads to a speedup of 3.9-96.7x compared to designs using PL only on
the same ACAP device.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Enabling Accelerators for Graph Computing [0.0]
Graph Neural Networks (GNNs) offer a novel paradigm for learning on graph-structured data.
GNNs present new computational challenges compared to conventional neural networks.
This thesis aims to develop a better understanding of how GNNs interact with the underlying hardware.
arXiv Detail & Related papers (2023-12-16T23:31:20Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - Hardware/Software Co-Programmable Framework for Computational SSDs to
Accelerate Deep Learning Service on Large-Scale Graphs [8.698995648930806]
Graph neural networks (GNNs) process large-scale graphs consisting of a hundred billion edges.
We propose a novel deep learning framework on large graphs, HolisticGNN, that provides an easy-to-use, near-storage inference infrastructure for fast, energy-efficient GNN processing.
arXiv Detail & Related papers (2022-01-23T06:08:18Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Fully-parallel Convolutional Neural Network Hardware [0.7829352305480285]
We propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware.
For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA.
arXiv Detail & Related papers (2020-06-22T17:19:09Z) - Compiling Spiking Neural Networks to Neuromorphic Hardware [4.273223677453178]
Spiking Neural Network (SNN) can lower the energy consumption of machine learning applications executed on neuromorphic hardware.
We propose an approach to analyze and compile SNNs on a resource-constrained neuromorphic hardware.
arXiv Detail & Related papers (2020-04-07T21:13:27Z) - GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms [1.2183405753834562]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs.
It is challenging to accelerate training of GCNs due to substantial and irregular data communication.
We design a novel accelerator for training GCNs on CPU-FPGA heterogeneous systems.
arXiv Detail & Related papers (2019-12-31T21:19:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.