Graph Learning at Scale: Characterizing and Optimizing Pre-Propagation GNNs
- URL: http://arxiv.org/abs/2504.13266v1
- Date: Thu, 17 Apr 2025 18:20:40 GMT
- Title: Graph Learning at Scale: Characterizing and Optimizing Pre-Propagation GNNs
- Authors: Zichao Yue, Chenhui Deng, Zhiru Zhang,
- Abstract summary: Pre-propagation GNNs represent a new class of models that decouple feature propagation from training through pre-processing.<n>This paper provides a comprehensive characterization of PP-GNNs, comparing them with graph-sampling-based methods in training efficiency, scalability, and accuracy.<n>We propose optimized data loading schemes and tailored training methods that improve PP-GNN training throughput by an average of 15$times$ over the PP-GNN baselines.
- Score: 9.21955649907066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph neural networks (GNNs) are widely used for learning node embeddings in graphs, typically adopting a message-passing scheme. This approach, however, leads to the neighbor explosion problem, with exponentially growing computational and memory demands as layers increase. Graph sampling has become the predominant method for scaling GNNs to large graphs, mitigating but not fully solving the issue. Pre-propagation GNNs (PP-GNNs) represent a new class of models that decouple feature propagation from training through pre-processing, addressing neighbor explosion in theory. Yet, their practical advantages and system-level optimizations remain underexplored. This paper provides a comprehensive characterization of PP-GNNs, comparing them with graph-sampling-based methods in training efficiency, scalability, and accuracy. While PP-GNNs achieve comparable accuracy, we identify data loading as the key bottleneck for training efficiency and input expansion as a major scalability challenge. To address these issues, we propose optimized data loading schemes and tailored training methods that improve PP-GNN training throughput by an average of 15$\times$ over the PP-GNN baselines, with speedup of up to 2 orders of magnitude compared to sampling-based GNNs on large graph benchmarks. Our implementation is publicly available at https://github.com/cornell-zhang/preprop-gnn.
Related papers
- SpanGNN: Towards Memory-Efficient Graph Neural Networks via Spanning Subgraph Training [14.63975787929143]
Graph Neural Networks (GNNs) have superior capability in learning graph data.
Full-graph GNN training generally has high accuracy, however, it suffers from large peak memory usage.
We propose a new memory-efficient GNN training method using spanning subgraph, called SpanGNN.
arXiv Detail & Related papers (2024-06-07T13:46:23Z) - Learning to Reweight for Graph Neural Network [63.978102332612906]
Graph Neural Networks (GNNs) show promising results for graph tasks.
Existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data.
We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability.
arXiv Detail & Related papers (2023-12-19T12:25:10Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - Graph Ladling: Shockingly Simple Parallel GNN Training without
Intermediate Communication [100.51884192970499]
GNNs are a powerful family of neural networks for learning over graphs.
scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing.
We propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs.
arXiv Detail & Related papers (2023-06-18T03:33:46Z) - GNN-Ensemble: Towards Random Decision Graph Neural Networks [3.7620848582312405]
Graph Neural Networks (GNNs) have enjoyed wide spread applications in graph-structured data.
GNNs are required to learn latent patterns from a limited amount of training data to perform inferences on a vast amount of test data.
In this paper, we push one step forward on the ensemble learning of GNNs with improved accuracy, robustness, and adversarial attacks.
arXiv Detail & Related papers (2023-03-20T18:24:01Z) - Training Graph Neural Networks on Growing Stochastic Graphs [114.75710379125412]
Graph Neural Networks (GNNs) rely on graph convolutions to exploit meaningful patterns in networked data.
We propose to learn GNNs on very large graphs by leveraging the limit object of a sequence of growing graphs, the graphon.
arXiv Detail & Related papers (2022-10-27T16:00:45Z) - MentorGNN: Deriving Curriculum for Pre-Training GNNs [61.97574489259085]
We propose an end-to-end model named MentorGNN that aims to supervise the pre-training process of GNNs across graphs.
We shed new light on the problem of domain adaption on relational data (i.e., graphs) by deriving a natural and interpretable upper bound on the generalization error of the pre-trained GNNs.
arXiv Detail & Related papers (2022-08-21T15:12:08Z) - BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and
Preprocessing [0.0]
Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data.
Existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs.
This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas.
arXiv Detail & Related papers (2021-12-16T00:37:37Z) - A Unified Lottery Ticket Hypothesis for Graph Neural Networks [82.31087406264437]
We present a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights.
We further generalize the popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network.
arXiv Detail & Related papers (2021-02-12T21:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.