Related papers: iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations

iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations

URL: http://arxiv.org/abs/2403.14853v1
Date: Thu, 21 Mar 2024 21:56:44 GMT
Title: iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations
Authors: Md Saidul Hoque Anik, Pranav Badhe, Rohit Gampa, Ariful Azad,
Abstract summary: iSpLib is a PyTorch-based C++ library equipped with auto-tuned sparse operations. We demonstrate that iSpLib obtains up to 27x overall training speedup compared to the equivalent PyTorch 2.1.0 and PyTorch Geometric 2.4.0 implementations on the CPU.
Score: 1.3030767447016454
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Core computations in Graph Neural Network (GNN) training and inference are often mapped to sparse matrix operations such as sparse-dense matrix multiplication (SpMM). These sparse operations are harder to optimize by manual tuning because their performance depends significantly on the sparsity of input graphs, GNN models, and computing platforms. To address this challenge, we present iSpLib, a PyTorch-based C++ library equipped with auto-tuned sparse operations. iSpLib expedites GNN training with a cache-enabled backpropagation that stores intermediate matrices in local caches. The library offers a user-friendly Python plug-in that allows users to take advantage of our optimized PyTorch operations out-of-the-box for any existing linear algebra-based PyTorch implementation of popular GNNs (Graph Convolution Network, GraphSAGE, Graph Inference Network, etc.) with only two lines of additional code. We demonstrate that iSpLib obtains up to 27x overall training speedup compared to the equivalent PyTorch 2.1.0 and PyTorch Geometric 2.4.0 implementations on the CPU. Our library is publicly available at https://github.com/HipGraph/iSpLib (https://doi.org/10.5281/zenodo.10806511).

Related papers

Nerva: a Truly Sparse Implementation of Neural Networks [16.29955529463831]
Nerva is a fast neural network library under development in C++. It supports sparsity by using the sparse matrix operations of Intel's Math Kernel Library.
arXiv Detail & Related papers (2024-07-24T17:13:31Z)
PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures [10.047157906258196]
We introduce PyGim, an efficient ML library that accelerates Graph Neural Networks on real PIM systems. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric systems. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x.
arXiv Detail & Related papers (2024-02-26T16:52:35Z)
PyTorch Geometric High Order: A Unified Library for High Order Graph Neural Network [32.537428858455]
PyTorch Geometric High Order (PyGHO) is a library for High Order Graph Neural Networks (HOGNNs) that extends PyTorch (PyG) We present a detailed in-depth of PyGHO and compare HOGNNs implemented with PyGHO with their official implementation on real-world tasks.
arXiv Detail & Related papers (2023-11-28T10:34:48Z)
RSC: Accelerating Graph Neural Networks Training via Randomized Sparse Computations [56.59168541623729]
Training graph neural networks (GNNs) is time consuming because sparse graph-based operations are hard to be accelerated by hardware. We explore trading off the computational precision to reduce the time complexity via sampling-based approximation. We propose Randomized Sparse Computation, which for the first time demonstrate the potential of training GNNs with approximated operations.
arXiv Detail & Related papers (2022-10-19T17:25:33Z)
Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences [55.329402218608365]
We propose the Neighbor2Seq to transform the hierarchical neighborhood of each node into a sequence. We evaluate our method on a massive graph with more than 111 million nodes and 1.6 billion edges. Results show that our proposed method is scalable to massive graphs and achieves superior performance across massive and medium-scale graphs.
arXiv Detail & Related papers (2022-02-07T16:38:36Z)
VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator. textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z)
PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses [19.2129567657739]
We introduce PyTorch-Direct, which enables a GPU-centric data accessing paradigm for graph neural networks (GNNs) training. Our microbenchmark and end-to-end GNN training results show that PyTorch-Direct reduces data transfer time by 47.1% on average and speeds up GNN training by up to 1.6x.
arXiv Detail & Related papers (2021-01-20T04:24:39Z)
Scalable Graph Neural Networks via Bidirectional Propagation [89.70835710988395]
Graph Neural Networks (GNN) is an emerging field for learning on non-Euclidean data. This paper presents GBP, a scalable GNN that utilizes a localized bidirectional propagation process from both the feature vectors and the training/testing nodes. An empirical study demonstrates that GBP achieves state-of-the-art performance with significantly less training/testing time.
arXiv Detail & Related papers (2020-10-29T08:55:33Z)
Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs. In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings. We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)
Hybrid Models for Learning to Branch [81.93868699246214]
We propose a new hybrid architecture for efficient branching on CPU machines. The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching.
arXiv Detail & Related papers (2020-06-26T21:03:45Z)
Kernel Operations on the GPU, with Autodiff, without Memory Overflows [5.669790037378094]
The KeOps library provides a fast and memory-efficient GPU support for tensors whose entries are given by a mathematical formula. KeOps alleviates the major bottleneck of tensor-centric libraries for kernel and geometric applications: memory consumption. KeOps combines optimized C++/CUDA schemes with binders for high-level languages: Python (Numpy and PyTorch), Matlab and R.
arXiv Detail & Related papers (2020-03-27T08:54:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.