Related papers: Operation-Level Performance Benchmarking of Graph Neural Networks for Scientific Applications

Operation-Level Performance Benchmarking of Graph Neural Networks for Scientific Applications

URL: http://arxiv.org/abs/2207.09955v1
Date: Wed, 20 Jul 2022 15:01:12 GMT
Title: Operation-Level Performance Benchmarking of Graph Neural Networks for Scientific Applications
Authors: Ryien Hosseini, Filippo Simini, Venkatram Vishwanath
Abstract summary: We profile and select low-level operations pertinent to Graph Neural Networks (GNNs) for scientific computing implemented in the Pytorch Geometric software framework. These are then rigorously benchmarked on NVIDIA A100 GPUs for several various combinations of input values, including tensor sparsity. At a high level, we conclude that on NVIDIA systems: (1) confounding bottlenecks such as memory inefficiency often dominate runtime costs moreso than data sparsity alone. We hope that these results serve as a baseline for those developing these operations on specialized hardware and that our subsequent analysis helps to facilitate future software and hardware based optimizations of these operations and
Score: 0.15469452301122172
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As Graph Neural Networks (GNNs) increase in popularity for scientific machine learning, their training and inference efficiency is becoming increasingly critical. Additionally, the deep learning field as a whole is trending towards wider and deeper networks, and ever increasing data sizes, to the point where hard hardware bottlenecks are often encountered. Emerging specialty hardware platforms provide an exciting solution to this problem. In this paper, we systematically profile and select low-level operations pertinent to GNNs for scientific computing implemented in the Pytorch Geometric software framework. These are then rigorously benchmarked on NVIDIA A100 GPUs for several various combinations of input values, including tensor sparsity. We then analyze these results for each operation. At a high level, we conclude that on NVIDIA systems: (1) confounding bottlenecks such as memory inefficiency often dominate runtime costs moreso than data sparsity alone, (2) native Pytorch operations are often as or more competitive than their Pytorch Geometric equivalents, especially at low to moderate levels of input data sparsity, and (3) many operations central to state-of-the-art GNN architectures have little to no optimization for sparsity. We hope that these results serve as a baseline for those developing these operations on specialized hardware and that our subsequent analysis helps to facilitate future software and hardware based optimizations of these operations and thus scalable GNN performance as a whole.

Related papers

Accelerating Sparse Graph Neural Networks with Tensor Core Optimization [0.0]
Graphdense networks (GNNs) have seen extensive application in domains such as social networks, bioinformatics, computation and recommendation systems. Traditional computing methods are insufficient to meet the performance demands of GNNs. Recent research has explored parallel acceleration using Cores and Cores, but significant challenges persist.
arXiv Detail & Related papers (2024-12-16T01:57:53Z)
T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining. Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z)
AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs [26.607519045805745]
Graph neural networks (GNNs) are powerful tools for exploring and learning from graph structures and features. Prior works have proposed to explore the sparsity in the input graph to accelerate GNNs, which uses the full-graph-level or block-level sparsity format. We show that they fail to balance the sparsity benefit and kernel execution efficiency. We propose a novel system, referred to as AdaptGear, that addresses the challenge of optimizing GNNs performance.
arXiv Detail & Related papers (2023-05-27T08:22:12Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs) We present a new ensembling training manner, named EnGCN, to address the existing issues. Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z)
Hardware/Software Co-Programmable Framework for Computational SSDs to Accelerate Deep Learning Service on Large-Scale Graphs [8.698995648930806]
Graph neural networks (GNNs) process large-scale graphs consisting of a hundred billion edges. We propose a novel deep learning framework on large graphs, HolisticGNN, that provides an easy-to-use, near-storage inference infrastructure for fast, energy-efficient GNN processing.
arXiv Detail & Related papers (2022-01-23T06:08:18Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study [100.27567794045045]
Training deep graph neural networks (GNNs) is notoriously hard. We present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs.
arXiv Detail & Related papers (2021-08-24T05:00:37Z)
Learning on Hardware: A Tutorial on Neural Network Accelerators and Co-Processors [0.0]
Deep neural networks (DNNs) have the advantage that they can take into account a large number of parameters, which enables them to solve complex tasks. In computer vision and speech recognition, they have a better accuracy than common algorithms, and in some tasks, they boast an even higher accuracy than human experts. With the progress of DNNs in recent years, many other fields of application such as diagnosis of diseases and autonomous driving are taking advantage of them.
arXiv Detail & Related papers (2021-04-19T12:50:27Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
Analyzing the Performance of Graph Neural Networks with Pipe Parallelism [2.269587850533721]
We focus on Graph Neural Networks (GNNs) that have found great success in tasks such as node or edge classification and link prediction. New approaches for processing larger networks are needed to advance graph techniques. We study how GNNs could be parallelized using existing tools and frameworks that are known to be successful in the deep learning community.
arXiv Detail & Related papers (2020-12-20T04:20:38Z)
Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural Networks [8.460826851547294]
efficient graph analysis using modern machine learning is receiving a growing level of attention. Deep learning approaches often operate over the entire adjacency matrix. It is desirable to identify efficient measures to reduce both run-time and memory requirements.
arXiv Detail & Related papers (2020-10-23T19:47:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.