Entropy Aware Training for Fast and Accurate Distributed GNN
- URL: http://arxiv.org/abs/2311.02399v1
- Date: Sat, 4 Nov 2023 13:11:49 GMT
- Title: Entropy Aware Training for Fast and Accurate Distributed GNN
- Authors: Dhruv Deshmukh (1), Gagan Raj Gupta (1), Manisha Chawla (1), Vishwesh
Jatala (1), Anirban Haldar (1) ((1) Department of CSE, IIT Bhilai, India)
- Abstract summary: Several distributed frameworks have been developed to scale Graph Neural Networks (GNNs) on billion-size graphs.
We develop techniques that reduce training time and improve accuracy.
We implement our algorithms on the DistDGL framework and observe that our training techniques scale much better than the existing training approach.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several distributed frameworks have been developed to scale Graph Neural
Networks (GNNs) on billion-size graphs. On several benchmarks, we observe that
the graph partitions generated by these frameworks have heterogeneous data
distributions and class imbalance, affecting convergence, and resulting in
lower performance than centralized implementations. We holistically address
these challenges and develop techniques that reduce training time and improve
accuracy. We develop an Edge-Weighted partitioning technique to improve the
micro average F1 score (accuracy) by minimizing the total entropy. Furthermore,
we add an asynchronous personalization phase that adapts each compute-host's
model to its local data distribution. We design a class-balanced sampler that
considerably speeds up convergence. We implemented our algorithms on the
DistDGL framework and observed that our training techniques scale much better
than the existing training approach. We achieved a (2-3x) speedup in training
time and 4\% improvement on average in micro-F1 scores on 5 large graph
benchmarks compared to the standard baselines.
Related papers
- MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs [11.026326555186333]
This paper develops a parameterized continuous prefetch and eviction scheme on top of the state-of-the-art Amazon DistDGL distributed GNN framework.
It demonstrates about 15-40% improvement in end-to-end training performance on the National Energy Research Scientific Computing Center's (NERSC) Perlmutter supercomputer.
arXiv Detail & Related papers (2024-10-30T05:10:38Z) - CDFGNN: a Systematic Design of Cache-based Distributed Full-Batch Graph Neural Network Training with Communication Reduction [7.048300785744331]
Graph neural network training is mainly categorized into mini-batch and full-batch training methods.
In the distributed cluster, frequent remote accesses of features and gradients lead to huge communication overhead.
We introduce the cached-based distributed full-batch graph neural network training framework (CDFGNN)
Our results indicate that CDFGNN has great potential in accelerating distributed full-batch GNN training tasks.
arXiv Detail & Related papers (2024-08-01T01:57:09Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - Simplifying Distributed Neural Network Training on Massive Graphs:
Randomized Partitions Improve Model Aggregation [23.018715954992352]
We present a simplified framework for distributed GNN training that does not rely on the aforementioned costly operations.
Specifically, our framework assembles independent trainers, each of which asynchronously learns a local model on locally-available parts of the training graph.
In experiments on social and e-commerce networks with up to 1.3 billion edges, our proposed RandomTMA and SuperTMA approaches achieve state-of-the-art performance and 2.31x speedup compared to the fastest baseline.
arXiv Detail & Related papers (2023-05-17T01:49:44Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure.
We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - Distributed Optimization of Graph Convolutional Network using Subgraph
Variance [8.510726499008204]
We propose a Graph Augmentation based Distributed GCN framework(GAD)
GAD has two main components, GAD-Partition and GAD-r.
Our framework significantly reduces the communication overhead 50%, improves the convergence speed (2X) and slight gain in accuracy (0.45%) based on minimal redundancy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2021-10-06T18:01:47Z) - Decentralized Statistical Inference with Unrolled Graph Neural Networks [26.025935320024665]
We propose a learning-based framework, which unrolls decentralized optimization algorithms into graph neural networks (GNNs)
By minimizing the recovery error via end-to-end training, this learning-based framework resolves the model mismatch issue.
Our convergence analysis reveals that the learned model parameters may accelerate the convergence and reduce the recovery error to a large extent.
arXiv Detail & Related papers (2021-04-04T07:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.