Scalable Graph Generative Modeling via Substructure Sequences
- URL: http://arxiv.org/abs/2505.16130v1
- Date: Thu, 22 May 2025 02:16:34 GMT
- Title: Scalable Graph Generative Modeling via Substructure Sequences
- Authors: Zehong Wang, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye,
- Abstract summary: We introduce Generative Graph Pattern Machine (G$2$PM), a generative Transformer pre-training framework for graphs.<n>G$2$PM represents graph instances as sequences of substructures, and employs generative pre-training over the sequences to learn generalizable, transferable representations.<n>On the ogbn-arxiv benchmark, G$2$PM continues to improve with model sizes up to 60M parameters, outperforming prior generative approaches that plateau at significantly smaller scales.
- Score: 37.64864614356634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph neural networks (GNNs) has been predominantly driven by message-passing, where node representations are iteratively updated via local neighborhood aggregation. Despite their success, message-passing suffers from fundamental limitations -- including constrained expressiveness, over-smoothing, over-squashing, and limited capacity to model long-range dependencies. These issues hinder scalability: increasing data size or model size often fails to yield improved performance, limiting the viability of GNNs as backbones for graph foundation models. In this work, we explore pathways beyond message-passing and introduce Generative Graph Pattern Machine (G$^2$PM), a generative Transformer pre-training framework for graphs. G$^2$PM represents graph instances (nodes, edges, or entire graphs) as sequences of substructures, and employs generative pre-training over the sequences to learn generalizable, transferable representations. Empirically, G$^2$PM demonstrates strong scalability: on the ogbn-arxiv benchmark, it continues to improve with model sizes up to 60M parameters, outperforming prior generative approaches that plateau at significantly smaller scales (e.g., 3M). In addition, we systematically analyze the model design space, highlighting key architectural choices that contribute to its scalability and generalization. Across diverse tasks -- including node classification, graph classification, and transfer learning -- G$^2$PM consistently outperforms strong baselines, establishing a compelling foundation for scalable graph learning. The code and dataset are available at https://github.com/Zehong-Wang/G2PM.
Related papers
- Generating Large Semi-Synthetic Graphs of Any Size [0.4419843514606336]
Graph generation is an important area in network science.<n>Recent advancements in deep learning have enabled data-driven methods to learn and generate graphs.<n>We propose Latent Graph Sampling Generation (LGSG) to generate graphs of varying sizes without retraining.
arXiv Detail & Related papers (2025-07-02T21:46:28Z) - Neural Graph Pattern Machine [50.78679002846741]
We propose the Neural Graph Pattern Machine (GPM), a framework designed to learn directly from graph patterns.<n>GPM efficiently extracts and encodes substructures while identifying the most relevant ones for downstream tasks.
arXiv Detail & Related papers (2025-01-30T20:37:47Z) - Revisiting Graph Neural Networks on Graph-level Tasks: Comprehensive Experiments, Analysis, and Improvements [54.006506479865344]
We propose a unified evaluation framework for graph-level Graph Neural Networks (GNNs)<n>This framework provides a standardized setting to evaluate GNNs across diverse datasets.<n>We also propose a novel GNN model with enhanced expressivity and generalization capabilities.
arXiv Detail & Related papers (2025-01-01T08:48:53Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - What Can We Learn from State Space Models for Machine Learning on Graphs? [11.38076877943004]
We propose Graph State Space Convolution (GSSC) as a principled extension of State Space Models (SSMs) to graph-structured data.
By leveraging global permutation-equivariant set aggregation and factorizable graph kernels, GSSC preserves all three advantages of SSMs.
Our findings highlight the potential of GSSC as a powerful and scalable model for graph machine learning.
arXiv Detail & Related papers (2024-06-09T15:03:36Z) - Implicit Graph Neural Diffusion Networks: Convergence, Generalization,
and Over-Smoothing [7.984586585987328]
Implicit Graph Neural Networks (GNNs) have achieved significant success in addressing graph learning problems.
We introduce a geometric framework for designing implicit graph diffusion layers based on a parameterized graph Laplacian operator.
We show how implicit GNN layers can be viewed as the fixed-point equation of a Dirichlet energy minimization problem.
arXiv Detail & Related papers (2023-08-07T05:22:33Z) - Graph Ladling: Shockingly Simple Parallel GNN Training without
Intermediate Communication [100.51884192970499]
GNNs are a powerful family of neural networks for learning over graphs.
scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing.
We propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs.
arXiv Detail & Related papers (2023-06-18T03:33:46Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural
Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs.
Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs.
Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z) - GNNAutoScale: Scalable and Expressive Graph Neural Networks via
Historical Embeddings [51.82434518719011]
GNNAutoScale (GAS) is a framework for scaling arbitrary message-passing GNNs to large graphs.
Gas prunes entire sub-trees of the computation graph by utilizing historical embeddings from prior training iterations.
Gas reaches state-of-the-art performance on large-scale graphs.
arXiv Detail & Related papers (2021-06-10T09:26:56Z) - Multivariate Time Series Classification with Hierarchical Variational
Graph Pooling [23.66868187446734]
Existing deep learning-based MTSC techniques are primarily concerned with the temporal dependency of single time series.
We propose a novel graph pooling-based framework MTPool to obtain the expressive global representation of MTS.
Experiments on ten benchmark datasets exhibit MTPool outperforms state-of-the-art strategies in the MTSC task.
arXiv Detail & Related papers (2020-10-12T12:36:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.