Related papers: PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch

URL: http://arxiv.org/abs/2503.19779v1
Date: Tue, 25 Mar 2025 15:47:54 GMT
Title: PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch
Authors: Abhishek Ghosh, Ajay Nayak, Ashish Panwar, Arkaprava Basu,
Abstract summary: We introduce PyGraph, a novel approach to automatically harness the power of NVIDIA Graphs within PyTorch2.<n>We evaluate PyGraph across various machine learning benchmarks, demonstrating substantial performance improvements over PyTorch2.
Score: 1.2334708058524546
License: http://creativecommons.org/licenses/by/4.0/
Abstract: CUDA Graphs -- a recent hardware feature introduced for NVIDIA GPUs -- aim to reduce CPU launch overhead by capturing and launching a series of GPU tasks (kernels) as a DAG. However, deploying CUDA Graphs faces several challenges today due to the static structure of a graph. It also incurs performance overhead due to data copy. In fact, we show a counter-intuitive result -- deploying CUDA Graphs hurts performance in many cases. We introduce PyGraph, a novel approach to automatically harness the power of CUDA Graphs within PyTorch2. Driven by three key observations, PyGraph embodies three novel optimizations: it enables wider deployment of CUDA Graphs, reduces GPU kernel parameter copy overheads, and selectively deploys CUDA Graphs based on a cost-benefit analysis. PyGraph seamlessly integrates with PyTorch2's compilation toolchain, enabling efficient use of CUDA Graphs without manual modifications to the code. We evaluate PyGraph across various machine learning benchmarks, demonstrating substantial performance improvements over PyTorch2.

Related papers

A User's Guide to $\texttt{KSig}$: GPU-Accelerated Computation of the Signature Kernel [12.111848705677138]
The signature kernel is a positive definite kernel for sequential and temporal data.<n>In this chapter, we give a short introduction to $textttKSig$, a $textttScikit-Learn$ compatible Python package that implements various GPU-accelerated algorithms for computing signature kernels.
arXiv Detail & Related papers (2025-01-13T09:11:13Z)
GraphStorm: all-in-one graph machine learning framework for industry applications [75.23076561638348]
GraphStorm is an end-to-end solution for scalable graph construction, graph model training and inference. Every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code. GraphStorm has been used and deployed for over a dozen billion-scale industry applications after its release in May 2023.
arXiv Detail & Related papers (2024-06-10T04:56:16Z)
iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations [1.3030767447016454]
iSpLib is a PyTorch-based C++ library equipped with auto-tuned sparse operations. We demonstrate that iSpLib obtains up to 27x overall training speedup compared to the equivalent PyTorch 2.1.0 and PyTorch Geometric 2.4.0 implementations on the CPU.
arXiv Detail & Related papers (2024-03-21T21:56:44Z)
PockEngine: Sparse and Efficient Fine-tuning in a Pocket [62.955793932377524]
We introduce PockEngine: a tiny, sparse and efficient engine to enable fine-tuning on various edge devices. PockEngine supports sparse backpropagation and sparsely updates the model with measured memory saving and latency reduction. Remarkably, PockEngine enables fine-tuning LLaMav2-7B on NVIDIA Jetson AGX Orin at 550 tokens/s, 7.9$times$ faster than the PyTorch.
arXiv Detail & Related papers (2023-10-26T19:46:11Z)
TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs [24.790481918123103]
This paper introduces TpuGraphs, a performance prediction dataset on full tensor programs. Each graph in the dataset represents the main computation of a machine learning workload. TpuGraphs provides 25x more graphs than the largest graph property prediction dataset.
arXiv Detail & Related papers (2023-08-25T17:04:35Z)
PyGOD: A Python Library for Graph Outlier Detection [56.33769221859135]
PyGOD is an open-source library for detecting outliers in graph data. It supports a wide array of leading graph-based methods for outlier detection. PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI)
arXiv Detail & Related papers (2022-04-26T06:15:21Z)
Scaling R-GCN Training with Graph Summarization [71.06855946732296]
Training of Relation Graph Convolutional Networks (R-GCN) does not scale well with the size of the graph. In this work, we experiment with the use of graph summarization techniques to compress the graph. We obtain reasonable results on the AIFB, MUTAG and AM datasets.
arXiv Detail & Related papers (2022-03-05T00:28:43Z)
Boosting Graph Embedding on a Single GPU [3.093890460224435]
We present GOSH, a GPU-based tool for embedding large-scale graphs with minimum hardware constraints. It employs a novel graph coarsening algorithm to enhance the impact of updates and minimize the work for embedding. It also incorporates a decomposition schema that enables any arbitrarily large graph to be embedded with a single GPU.
arXiv Detail & Related papers (2021-10-19T15:25:04Z)
GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings [51.82434518719011]
GNNAutoScale (GAS) is a framework for scaling arbitrary message-passing GNNs to large graphs. Gas prunes entire sub-trees of the computation graph by utilizing historical embeddings from prior training iterations. Gas reaches state-of-the-art performance on large-scale graphs.
arXiv Detail & Related papers (2021-06-10T09:26:56Z)
Efficient Graph Deep Learning in TensorFlow with tf_geometric [53.237754811019464]
We introduce tf_geometric, an efficient and friendly library for graph deep learning. tf_geometric provides kernel libraries for building Graph Neural Networks (GNNs) as well as implementations of popular GNNs. The kernel libraries consist of infrastructures for building efficient GNNs, including graph data structures, graph map-reduce framework, graph mini-batch strategy, etc.
arXiv Detail & Related papers (2021-01-27T17:16:36Z)
FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems [23.258185277825888]
FeatGraph provides a flexible programming interface to express diverse GNN models. FeatGraph speeds up end-to-end GNN training and inference by up to 32x on CPU and 7x on GPU.
arXiv Detail & Related papers (2020-08-26T03:17:05Z)
Deep Graph Library Optimizations for Intel(R) x86 Architecture [3.518762870118332]
We present performance analysis, optimizations and results across a set of Graph Neural Networks (GNN) applications using the latest version of Deep Graph Library (DGL) Across 7 applications, we achieve speed-ups ranging from1 1.5x-13x over the baseline CPU implementations.
arXiv Detail & Related papers (2020-07-13T12:57:16Z)
Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs. In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings. We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.