Related papers: Grappa: Gradient-Only Communication for Scalable Graph Neural Network Training

Grappa: Gradient-Only Communication for Scalable Graph Neural Network Training

URL: http://arxiv.org/abs/2602.01872v1
Date: Mon, 02 Feb 2026 09:44:12 GMT
Title: Grappa: Gradient-Only Communication for Scalable Graph Neural Network Training
Authors: Chongyang Xu, Christoph Siebenbrunner, Laurent Bindschaedler,
Abstract summary: Grappa is a distributed GNN training framework that enforces gradient-only communication.<n>During each iteration, partitions train in isolation and exchange only for the global update.<n>We show that Grappa trains GNNs 4 times faster on average (up to 13 times) than state-of-the-art systems.
Score: 2.3139750881704044
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cross-partition edges dominate the cost of distributed GNN training: fetching remote features and activations per iteration overwhelms the network as graphs deepen and partition counts grow. Grappa is a distributed GNN training framework that enforces gradient-only communication: during each iteration, partitions train in isolation and exchange only gradients for the global update. To recover accuracy lost to isolation, Grappa (i) periodically repartitions to expose new neighborhoods and (ii) applies a lightweight coverage-corrected gradient aggregation inspired by importance sampling. We prove the corrected estimator is asymptotically unbiased under standard support and boundedness assumptions, and we derive a batch-level variant for compatibility with common deep-learning packages that minimizes mean-squared deviation from the ideal node-level correction. We also introduce a shrinkage version that improves stability in practice. Empirical results on real and synthetic graphs show that Grappa trains GNNs 4 times faster on average (up to 13 times) than state-of-the-art systems, achieves better accuracy especially for deeper models, and sustains training at the trillion-edge scale on commodity hardware. Grappa is model-agnostic, supports full-graph and mini-batch training, and does not rely on high-bandwidth interconnects or caching.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Gradient Rewiring for Editable Graph Neural Network Training [84.77778876113099]
underlineGradient underlineRewiring method for underlineEditable graph neural network training, named textbfGRE. We propose a simple yet effective underlineGradient underlineRewiring method for underlineEditable graph neural network training, named textbfGRE.
arXiv Detail & Related papers (2024-10-21T01:01:50Z)
Distributed Training of Large Graph Neural Networks with Variable Communication Rates [71.7293735221656]
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements. Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs. We introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model.
arXiv Detail & Related papers (2024-06-25T14:57:38Z)
Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z)
Fast and Effective GNN Training through Sequences of Random Path Graphs [20.213843086649014]
We present GERN, a novel scalable framework for training GNNs in node classification tasks.<n>Our method progressively refines the GNN weights on a sequence of random spanning trees suitably transformed into path graphs.<n>The sparse nature of these path graphs substantially lightens the computational burden of GNN training.
arXiv Detail & Related papers (2023-06-07T23:12:42Z)
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training [6.557328947642343]
Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming. This paper proposes an efficient GNN training system, AdaQP, to expedite distributed full-graph training.
arXiv Detail & Related papers (2023-06-02T09:02:09Z)
Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure. We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z)
DNN gradient lossless compression: Can GenNorm be the answer? [17.37160669785566]
gradient compression is relevant in many distributed Deep Neural Network (DNN) training scenarios. For some networks of practical interest, the gradient entries can be well modelled as having a generalized normal (GenNorm) distribution.
arXiv Detail & Related papers (2021-11-15T08:33:10Z)
Efficient Distributed Auto-Differentiation [22.192220404846267]
gradient-based algorithms for training large deep neural networks (DNNs) are communication-heavy. We introduce a surprisingly simple statistic for training distributed DNNs that is more communication-friendly than the gradient. The process provides the flexibility of averaging gradients during backpropagation, enabling novel flexible training schemas.
arXiv Detail & Related papers (2021-02-18T21:46:27Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.