Related papers: Provably Convergent Subgraph-wise Sampling for Fast GNN Training

Provably Convergent Subgraph-wise Sampling for Fast GNN Training

URL: http://arxiv.org/abs/2303.11081v2
Date: Wed, 21 Aug 2024 02:54:45 GMT
Title: Provably Convergent Subgraph-wise Sampling for Fast GNN Training
Authors: Jie Wang, Zhihao Shi, Xize Liang, Defu Lian, Shuiwang Ji, Bin Li, Enhong Chen, Feng Wu,
Abstract summary: We propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC) LMC retrieves the discarded messages in backward passes based on a message passing formulation of backward passes. Experiments on large-scale benchmarks demonstrate that LMC is significantly faster than state-of-the-art subgraph-wise sampling methods.
Score: 122.68566970275683
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Subgraph-wise sampling -- a promising class of mini-batch training techniques for graph neural networks (GNNs -- is critical for real-world applications. During the message passing (MP) in GNNs, subgraph-wise sampling methods discard messages outside the mini-batches in backward passes to avoid the well-known neighbor explosion problem, i.e., the exponentially increasing dependencies of nodes with the number of MP iterations. However, discarding messages may sacrifice the gradient estimation accuracy, posing significant challenges to their convergence analysis and convergence speeds. To address this challenge, we propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC). To the best of our knowledge, LMC is the first subgraph-wise sampling method with provable convergence. The key idea is to retrieve the discarded messages in backward passes based on a message passing formulation of backward passes. By efficient and effective compensations for the discarded messages in both forward and backward passes, LMC computes accurate mini-batch gradients and thus accelerates convergence. Moreover, LMC is applicable to various MP-based GNN architectures, including convolutional GNNs (finite message passing iterations with different layers) and recurrent GNNs (infinite message passing iterations with a shared layer). Experiments on large-scale benchmarks demonstrate that LMC is significantly faster than state-of-the-art subgraph-wise sampling methods.

Related papers

FastMCTS: A Simple Sampling Strategy for Data Synthesis [67.60823802317141]
We introduce FastMCTS, an innovative data synthesis strategy inspired by Monte Carlo Tree Search. FastMCTS provides a more efficient sampling method for multi-step reasoning data, offering step-level evaluation signals. Experiments on both English and Chinese reasoning datasets demonstrate that FastMCTS generates over 30% more correct reasoning paths.
arXiv Detail & Related papers (2025-02-17T06:27:57Z)
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution [87.3259169631789]
Nearest Speculative Decoding (NEST) is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources. NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks. In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
arXiv Detail & Related papers (2024-05-29T17:55:03Z)
Faster Convergence with Less Communication: Broadcast-Based Subgraph Sampling for Decentralized Learning over Wireless Networks [32.914407967052114]
$texttBASS$ is a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD. We show that $texttBASS$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods.
arXiv Detail & Related papers (2024-01-24T20:00:23Z)
Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo [30.4930148381328]
Diffusion-based Monte Carlo (DMC) is a method to sample from a general target distribution beyond the isoperimetric condition. DMC encountered high gradient complexity, resulting in an exponential dependency on the error tolerance $epsilon$ of the obtained samples. We propose RS-DMC, based on a novel recursion-based score estimation method. Our algorithm is provably much faster than the popular Langevin-based algorithms.
arXiv Detail & Related papers (2024-01-12T02:33:57Z)
AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models [103.41269503488546]
Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. We propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters.
arXiv Detail & Related papers (2023-07-20T09:06:21Z)
LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence [8.630426703200541]
We propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC) LMC retrieves the discarded messages in backward passes based on a message passing formulation of backward passes. LMC significantly outperforms state-of-the-art subgraph-wise sampling methods in terms of efficiency.
arXiv Detail & Related papers (2023-02-02T07:52:34Z)
Calibrate and Debias Layer-wise Sampling for Graph Convolutional Networks [39.56471534442315]
This paper revisits the approach from a matrix approximation perspective. We propose a new principle for constructing sampling probabilities and an efficient debiasing algorithm. Improvements are demonstrated by extensive analyses of estimation variance and experiments on common benchmarks.
arXiv Detail & Related papers (2022-06-01T15:52:06Z)
VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization [70.8567058758375]
VQ-GNN is a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix.
arXiv Detail & Related papers (2021-10-27T11:48:50Z)
MG-GCN: Fast and Effective Learning with Mix-grained Aggregators for Training Large Graph Convolutional Networks [20.07942308916373]
Graph convolutional networks (GCNs) generate the embeddings of nodes by aggregating the information of their neighbors layer by layer. The high computational and memory cost of GCNs makes it infeasible for training on large graphs. A new model, named Mix-grained GCN (MG-GCN), achieves state-of-the-art performance in terms of accuracy, training speed, convergence speed, and memory cost.
arXiv Detail & Related papers (2020-11-17T14:51:57Z)
Solving Sparse Linear Inverse Problems in Communication Systems: A Deep Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems. The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z)
MLE-guided parameter search for task loss minimization in neural sequence modeling [83.83249536279239]
Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks. We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient. Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation.
arXiv Detail & Related papers (2020-06-04T22:21:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.