Related papers: Entropy Aware Training for Fast and Accurate Distributed GNN

Entropy Aware Training for Fast and Accurate Distributed GNN

URL: http://arxiv.org/abs/2311.02399v1
Date: Sat, 4 Nov 2023 13:11:49 GMT
Title: Entropy Aware Training for Fast and Accurate Distributed GNN
Authors: Dhruv Deshmukh (1), Gagan Raj Gupta (1), Manisha Chawla (1), Vishwesh Jatala (1), Anirban Haldar (1) ((1) Department of CSE, IIT Bhilai, India)
Abstract summary: Several distributed frameworks have been developed to scale Graph Neural Networks (GNNs) on billion-size graphs. We develop techniques that reduce training time and improve accuracy. We implement our algorithms on the DistDGL framework and observe that our training techniques scale much better than the existing training approach.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Several distributed frameworks have been developed to scale Graph Neural Networks (GNNs) on billion-size graphs. On several benchmarks, we observe that the graph partitions generated by these frameworks have heterogeneous data distributions and class imbalance, affecting convergence, and resulting in lower performance than centralized implementations. We holistically address these challenges and develop techniques that reduce training time and improve accuracy. We develop an Edge-Weighted partitioning technique to improve the micro average F1 score (accuracy) by minimizing the total entropy. Furthermore, we add an asynchronous personalization phase that adapts each compute-host's model to its local data distribution. We design a class-balanced sampler that considerably speeds up convergence. We implemented our algorithms on the DistDGL framework and observed that our training techniques scale much better than the existing training approach. We achieved a (2-3x) speedup in training time and 4\% improvement on average in micro-F1 scores on 5 large graph benchmarks compared to the standard baselines.

Related papers

Efficient GNN Training Through Structure-Aware Randomized Mini-Batching [41.7545164294969]
Graph Neural Networks (GNNs) enable learning on realworld graphs and mini-batch training has emerged as the de facto standard for training GNNs. Existing mini-batching techniques employ randomization schemes to improve accuracy and convergence. We present Community-structure-aware Randomized Mini-batching (COMM-RAND), a novel methodology that bridges the gap between the above extremes.
arXiv Detail & Related papers (2025-04-25T05:16:53Z)
MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs [11.026326555186333]
This paper develops a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework. It demonstrates about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer.
arXiv Detail & Related papers (2024-10-30T05:10:38Z)
CDFGNN: a Systematic Design of Cache-based Distributed Full-Batch Graph Neural Network Training with Communication Reduction [7.048300785744331]
Graph neural network training is mainly categorized into mini-batch and full-batch training methods. In the distributed cluster, frequent remote accesses of features and gradients lead to huge communication overhead. We introduce the cached-based distributed full-batch graph neural network training framework (CDFGNN) Our results indicate that CDFGNN has great potential in accelerating distributed full-batch GNN training tasks.
arXiv Detail & Related papers (2024-08-01T01:57:09Z)
Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z)
T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining. Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z)
Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z)
Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation [23.018715954992352]
We present a simplified framework for distributed GNN training that does not rely on the aforementioned costly operations. Specifically, our framework assembles independent trainers, each of which asynchronously learns a local model on locally-available parts of the training graph. In experiments on social and e-commerce networks with up to 1.3 billion edges, our proposed RandomTMA and SuperTMA approaches achieve state-of-the-art performance and 2.31x speedup compared to the fastest baseline.
arXiv Detail & Related papers (2023-05-17T01:49:44Z)
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs) We present a new ensembling training manner, named EnGCN, to address the existing issues. Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z)
Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes. We propose a metric that quantifies the ability of a graph to mix the current gradients. Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z)
Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z)
Distributed Optimization of Graph Convolutional Network using Subgraph Variance [8.510726499008204]
We propose a Graph Augmentation based Distributed GCN framework(GAD) GAD has two main components, GAD-Partition and GAD-r. Our framework significantly reduces the communication overhead 50%, improves the convergence speed (2X) and slight gain in accuracy (0.45%) based on minimal redundancy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2021-10-06T18:01:47Z)
Decentralized Statistical Inference with Unrolled Graph Neural Networks [26.025935320024665]
We propose a learning-based framework, which unrolls decentralized optimization algorithms into graph neural networks (GNNs) By minimizing the recovery error via end-to-end training, this learning-based framework resolves the model mismatch issue. Our convergence analysis reveals that the learned model parameters may accelerate the convergence and reduce the recovery error to a large extent.
arXiv Detail & Related papers (2021-04-04T07:52:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.