GraphGen+: Advancing Distributed Subgraph Generation and Graph Learning On Industrial Graphs
- URL: http://arxiv.org/abs/2503.06212v2
- Date: Wed, 02 Apr 2025 20:44:51 GMT
- Title: GraphGen+: Advancing Distributed Subgraph Generation and Graph Learning On Industrial Graphs
- Authors: Yue Jin, Yongchao Liu, Chuntao Hong,
- Abstract summary: Graph-based computations are crucial in a wide range of applications, where graphs can scale to trillions of edges.<n>Existing solutions face significant trade-offs: online subgraph generation is limited to a single machine, resulting in severe performance bottlenecks.<n>We propose textbfGraphGen+, an integrated framework that synchronizes distributed subgraph generation with in-memory graph learning.
- Score: 9.024357901512928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph-based computations are crucial in a wide range of applications, where graphs can scale to trillions of edges. To enable efficient training on such large graphs, mini-batch subgraph sampling is commonly used, which allows training without loading the entire graph into memory. However, existing solutions face significant trade-offs: online subgraph generation, as seen in frameworks like DGL and PyG, is limited to a single machine, resulting in severe performance bottlenecks, while offline precomputed subgraphs, as in GraphGen, improve sampling efficiency but introduce large storage overhead and high I/O costs during training. To address these challenges, we propose \textbf{GraphGen+}, an integrated framework that synchronizes distributed subgraph generation with in-memory graph learning, eliminating the need for external storage while significantly improving efficiency. GraphGen+ achieves a \textbf{27$\times$} speedup in subgraph generation compared to conventional SQL-like methods and a \textbf{1.3$\times$} speedup over GraphGen, supporting training on 1 million nodes per iteration and removing the overhead associated with precomputed subgraphs, making it a scalable and practical solution for industry-scale graph learning.
Related papers
- Distributed Graph Neural Network Inference With Just-In-Time Compilation For Industry-Scale Graphs [6.924892368183222]
Graph neural networks (GNNs) have delivered remarkable results in various fields.<n>The rapid increase in the scale of graph data has introduced significant performance bottlenecks for GNN inference.<n>This paper introduces an innovative processing paradgim for distributed graph learning that abstracts GNNs with a new set of programming interfaces.
arXiv Detail & Related papers (2025-03-08T13:26:59Z) - Exact Acceleration of Subgraph Graph Neural Networks by Eliminating Computation Redundancy [49.233339837170895]
This paper introduces Ego-Nets-Fit-All (ENFA), a model that uniformly takes the smaller ego nets as subgraphs.<n> ENFA can reduce storage space by 29.0% to 84.5% and improve training efficiency by up to 1.66x.
arXiv Detail & Related papers (2024-12-24T03:21:03Z) - GraphScale: A Framework to Enable Machine Learning over Billion-node Graphs [6.418397511692011]
We propose a unified framework for both supervised and unsupervised learning to store and process large graph data distributedly.
The key insight in our design is the separation of workers who store data and those who perform the training.
Our experiments show that GraphScale outperforms state-of-the-art methods for distributed training of both GNNs and node embeddings.
arXiv Detail & Related papers (2024-07-22T08:09:36Z) - Learning on Large Graphs using Intersecting Communities [13.053266613831447]
MPNNs iteratively update each node's representation in an input graph by aggregating messages from the node's neighbors.
MPNNs might quickly become prohibitive for large graphs provided they are not very sparse.
We propose approximating the input graph as an intersecting community graph (ICG) -- a combination of intersecting cliques.
arXiv Detail & Related papers (2024-05-31T09:26:26Z) - Distributed Graph Embedding with Information-Oriented Random Walks [16.290803469068145]
Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks.
We present a general-purpose, distributed, information-centric random walk-based graph embedding framework, DistGER, which can scale to embed billion-edge graphs.
D DistGER exhibits 2.33x-129x acceleration, 45% reduction in cross-machines communication, and > 10% effectiveness improvement in downstream tasks.
arXiv Detail & Related papers (2023-03-28T03:11:21Z) - Scalable Graph Convolutional Network Training on Distributed-Memory
Systems [5.169989177779801]
Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs.
Since the convolution operation on graphs induces irregular memory access patterns, designing a memory- and communication-efficient parallel algorithm for GCN training poses unique challenges.
We propose a highly parallel training algorithm that scales to large processor counts.
arXiv Detail & Related papers (2022-12-09T17:51:13Z) - DOTIN: Dropping Task-Irrelevant Nodes for GNNs [119.17997089267124]
Recent graph learning approaches have introduced the pooling strategy to reduce the size of graphs for learning.
We design a new approach called DOTIN (underlineDrunderlineopping underlineTask-underlineIrrelevant underlineNodes) to reduce the size of graphs.
Our method speeds up GAT by about 50% on graph-level tasks including graph classification and graph edit distance.
arXiv Detail & Related papers (2022-04-28T12:00:39Z) - Scaling R-GCN Training with Graph Summarization [71.06855946732296]
Training of Relation Graph Convolutional Networks (R-GCN) does not scale well with the size of the graph.
In this work, we experiment with the use of graph summarization techniques to compress the graph.
We obtain reasonable results on the AIFB, MUTAG and AM datasets.
arXiv Detail & Related papers (2022-03-05T00:28:43Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - Distributed Training of Graph Convolutional Networks using Subgraph
Approximation [72.89940126490715]
We propose a training strategy that mitigates the lost information across multiple partitions of a graph through a subgraph approximation scheme.
The subgraph approximation approach helps the distributed training system converge at single-machine accuracy.
arXiv Detail & Related papers (2020-12-09T09:23:49Z) - Multilevel Graph Matching Networks for Deep Graph Similarity Learning [79.3213351477689]
We propose a multi-level graph matching network (MGMN) framework for computing the graph similarity between any pair of graph-structured objects.
To compensate for the lack of standard benchmark datasets, we have created and collected a set of datasets for both the graph-graph classification and graph-graph regression tasks.
Comprehensive experiments demonstrate that MGMN consistently outperforms state-of-the-art baseline models on both the graph-graph classification and graph-graph regression tasks.
arXiv Detail & Related papers (2020-07-08T19:48:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.