Provably Convergent Subgraph-wise Sampling for Fast GNN Training
- URL: http://arxiv.org/abs/2303.11081v2
- Date: Wed, 21 Aug 2024 02:54:45 GMT
- Title: Provably Convergent Subgraph-wise Sampling for Fast GNN Training
- Authors: Jie Wang, Zhihao Shi, Xize Liang, Defu Lian, Shuiwang Ji, Bin Li, Enhong Chen, Feng Wu,
- Abstract summary: We propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC)
LMC retrieves the discarded messages in backward passes based on a message passing formulation of backward passes.
Experiments on large-scale benchmarks demonstrate that LMC is significantly faster than state-of-the-art subgraph-wise sampling methods.
- Score: 122.68566970275683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Subgraph-wise sampling -- a promising class of mini-batch training techniques for graph neural networks (GNNs -- is critical for real-world applications. During the message passing (MP) in GNNs, subgraph-wise sampling methods discard messages outside the mini-batches in backward passes to avoid the well-known neighbor explosion problem, i.e., the exponentially increasing dependencies of nodes with the number of MP iterations. However, discarding messages may sacrifice the gradient estimation accuracy, posing significant challenges to their convergence analysis and convergence speeds. To address this challenge, we propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC). To the best of our knowledge, LMC is the first subgraph-wise sampling method with provable convergence. The key idea is to retrieve the discarded messages in backward passes based on a message passing formulation of backward passes. By efficient and effective compensations for the discarded messages in both forward and backward passes, LMC computes accurate mini-batch gradients and thus accelerates convergence. Moreover, LMC is applicable to various MP-based GNN architectures, including convolutional GNNs (finite message passing iterations with different layers) and recurrent GNNs (infinite message passing iterations with a shared layer). Experiments on large-scale benchmarks demonstrate that LMC is significantly faster than state-of-the-art subgraph-wise sampling methods.
Related papers
- Nearest Neighbor Speculative Decoding for LLM Generation and Attribution [87.3259169631789]
Nearest Speculative Decoding (NEST) is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources.
NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks.
In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
arXiv Detail & Related papers (2024-05-29T17:55:03Z) - Faster Convergence with Less Communication: Broadcast-Based Subgraph
Sampling for Decentralized Learning over Wireless Networks [32.914407967052114]
$texttBASS$ is a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD.
We show that $texttBASS$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods.
arXiv Detail & Related papers (2024-01-24T20:00:23Z) - Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo [30.4930148381328]
Diffusion-based Monte Carlo (DMC) is a method to sample from a general target distribution beyond the isoperimetric condition.
DMC encountered high gradient complexity, resulting in an exponential dependency on the error tolerance $epsilon$ of the obtained samples.
We propose RS-DMC, based on a novel recursion-based score estimation method.
Our algorithm is provably much faster than the popular Langevin-based algorithms.
arXiv Detail & Related papers (2024-01-12T02:33:57Z) - AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models [103.41269503488546]
Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models with user-provided concepts.
This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents.
We propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs.
It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters.
arXiv Detail & Related papers (2023-07-20T09:06:21Z) - LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence [8.630426703200541]
We propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC)
LMC retrieves the discarded messages in backward passes based on a message passing formulation of backward passes.
LMC significantly outperforms state-of-the-art subgraph-wise sampling methods in terms of efficiency.
arXiv Detail & Related papers (2023-02-02T07:52:34Z) - Calibrate and Debias Layer-wise Sampling for Graph Convolutional
Networks [39.56471534442315]
This paper revisits the approach from a matrix approximation perspective.
We propose a new principle for constructing sampling probabilities and an efficient debiasing algorithm.
Improvements are demonstrated by extensive analyses of estimation variance and experiments on common benchmarks.
arXiv Detail & Related papers (2022-06-01T15:52:06Z) - VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using
Vector Quantization [70.8567058758375]
VQ-GNN is a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance.
Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix.
arXiv Detail & Related papers (2021-10-27T11:48:50Z) - MG-GCN: Fast and Effective Learning with Mix-grained Aggregators for
Training Large Graph Convolutional Networks [20.07942308916373]
Graph convolutional networks (GCNs) generate the embeddings of nodes by aggregating the information of their neighbors layer by layer.
The high computational and memory cost of GCNs makes it infeasible for training on large graphs.
A new model, named Mix-grained GCN (MG-GCN), achieves state-of-the-art performance in terms of accuracy, training speed, convergence speed, and memory cost.
arXiv Detail & Related papers (2020-11-17T14:51:57Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z) - MLE-guided parameter search for task loss minimization in neural
sequence modeling [83.83249536279239]
Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks.
We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient.
Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation.
arXiv Detail & Related papers (2020-06-04T22:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.