Related papers: MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs

URL: http://arxiv.org/abs/2410.22697v2
Date: Sun, 03 Nov 2024 18:27:13 GMT
Title: MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs
Authors: Aishwarya Sarkar, Sayan Ghosh, Nathan R. Tallent, Ali Jannesari,
Abstract summary: This paper develops a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework. It demonstrates about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer.
Score: 11.026326555186333
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph Neural Networks (GNN) are indispensable in learning from graph-structured data, yet their rising computational costs, especially on massively connected graphs, pose significant challenges in terms of execution performance. To tackle this, distributed-memory solutions such as partitioning the graph to concurrently train multiple replicas of GNNs are in practice. However, approaches requiring a partitioned graph usually suffer from communication overhead and load imbalance, even under optimal partitioning and communication strategies due to irregularities in the neighborhood minibatch sampling. This paper proposes practical trade-offs for improving the sampling and communication overheads for representation learning on distributed graphs (using popular GraphSAGE architecture) by developing a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework, demonstrating about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer for various OGB datasets.

Related papers

Distributed Graph Neural Network Inference With Just-In-Time Compilation For Industry-Scale Graphs [6.924892368183222]
Graph neural networks (GNNs) have delivered remarkable results in various fields. The rapid increase in the scale of graph data has introduced significant performance bottlenecks for GNN inference. This paper introduces an innovative processing paradgim for distributed graph learning that abstracts GNNs with a new set of programming interfaces.
arXiv Detail & Related papers (2025-03-08T13:26:59Z)
OMEGA: A Low-Latency GNN Serving System for Large Graphs [8.51634655687174]
Graph Neural Networks (GNNs) have been widely adopted for their ability to compute expressive node representations in graph datasets. Existing approximation techniques in training can mitigate the overheads but, in serving, still lead to high latency and/or accuracy loss. We propose OMEGA, a system that enables low-latency GNN serving for large graphs with minimal accuracy loss.
arXiv Detail & Related papers (2025-01-15T03:14:18Z)
Distributed Training of Large Graph Neural Networks with Variable Communication Rates [71.7293735221656]
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements. Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs. We introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model.
arXiv Detail & Related papers (2024-06-25T14:57:38Z)
GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural Properties of Graphs [5.410321469222541]
We propose GLISP, a sampling based GNN learning system for industrial scale graphs. GLISP consists of three core components: graph partitioner, graph sampling service and graph inference engine. Experiments show that GLISP achieves up to $6.53times$ and $70.77times$ speedups over existing GNN systems for training and inference tasks.
arXiv Detail & Related papers (2024-01-06T02:59:24Z)
Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z)
Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z)
ABC: Aggregation before Communication, a Communication Reduction Framework for Distributed Graph Neural Network Training and Effective Partition [0.0]
Graph Neural Networks (GNNs) are neural models tailored for graph-structure data and have shown superior performance in learning representations for graph-structured data. In this paper, we study the communication complexity during distributed GNNs training. We show that the new partition paradigm is particularly ideal in the case of dynamic graphs where it is infeasible to control the edge placement due to the unknown of the graph-changing process.
arXiv Detail & Related papers (2022-12-11T04:54:01Z)
Scalable Graph Convolutional Network Training on Distributed-Memory Systems [5.169989177779801]
Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs. Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges. We propose a highly parallel training algorithm that scales to large processor counts.
arXiv Detail & Related papers (2022-12-09T17:51:13Z)
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs) We present a new ensembling training manner, named EnGCN, to address the existing issues. Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z)
Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs. Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs. Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z)
Learning Graph Structure from Convolutional Mixtures [119.45320143101381]
We propose a graph convolutional relationship between the observed and latent graphs, and formulate the graph learning task as a network inverse (deconvolution) problem. In lieu of eigendecomposition-based spectral methods, we unroll and truncate proximal gradient iterations to arrive at a parameterized neural network architecture that we call a Graph Deconvolution Network (GDN) GDNs can learn a distribution of graphs in a supervised fashion, perform link prediction or edge-weight regression tasks by adapting the loss function, and they are inherently inductive.
arXiv Detail & Related papers (2022-05-19T14:08:15Z)
BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing [0.0]
Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data. Existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs. This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas.
arXiv Detail & Related papers (2021-12-16T00:37:37Z)
Distributed Optimization of Graph Convolutional Network using Subgraph Variance [8.510726499008204]
We propose a Graph Augmentation based Distributed GCN framework(GAD) GAD has two main components, GAD-Partition and GAD-r. Our framework significantly reduces the communication overhead 50%, improves the convergence speed (2X) and slight gain in accuracy (0.45%) based on minimal redundancy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2021-10-06T18:01:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.