MixGCN: Scalable GCN Training by Mixture of Parallelism and Mixture of Accelerators
- URL: http://arxiv.org/abs/2501.01951v2
- Date: Mon, 06 Jan 2025 06:46:07 GMT
- Title: MixGCN: Scalable GCN Training by Mixture of Parallelism and Mixture of Accelerators
- Authors: Cheng Wan, Runkai Tao, Zheng Du, Yang Katie Zhao, Yingyan Celine Lin,
- Abstract summary: Training Graph convolutional networks (GCNs) on full graphs is challenging.
Feature tensors can easily explode the memory and block the communication bandwidth of modern accelerators.
Workflow in training GCNs alternates between sparse and dense matrix operations.
- Score: 3.598994359810843
- License:
- Abstract: Graph convolutional networks (GCNs) have demonstrated superiority in graph-based learning tasks. However, training GCNs on full graphs is particularly challenging, due to the following two challenges: (1) the associated feature tensors can easily explode the memory and block the communication bandwidth of modern accelerators, and (2) the computation workflow in training GCNs alternates between sparse and dense matrix operations, complicating the efficient utilization of computational resources. Existing solutions for scalable distributed full-graph GCN training mostly adopt partition parallelism, which is unsatisfactory as they only partially address the first challenge while incurring scaled-out communication volume. To this end, we propose MixGCN aiming to simultaneously address both the aforementioned challenges towards GCN training. To tackle the first challenge, MixGCN integrates mixture of parallelism. Both theoretical and empirical analysis verify its constant communication volumes and enhanced balanced workload; For handling the second challenge, we consider mixture of accelerators (i.e., sparse and dense accelerators) with a dedicated accelerator for GCN training and a fine-grain pipeline. Extensive experiments show that MixGCN achieves boosted training efficiency and scalability.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Distributed Training of Large Graph Neural Networks with Variable Communication Rates [71.7293735221656]
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements.
Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs.
We introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model.
arXiv Detail & Related papers (2024-06-25T14:57:38Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - A Comprehensive Survey on Distributed Training of Graph Neural Networks [59.785830738482474]
Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields.
To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training.
The volume of related research on distributed GNN training is exceptionally vast, accompanied by an extraordinarily rapid pace of publication.
arXiv Detail & Related papers (2022-11-10T06:22:12Z) - BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks
with Boundary Node Sampling [25.32242812045678]
We propose a simple yet effective method dubbed BNS-GCN that adopts random Boundary-Node-Sampling to enable efficient and scalable distributed GCN training.
Experiments and ablation studies consistently validate the effectiveness of BNS-GCN, boosting the throughput by up to 16.2x and reducing the memory usage by up to 58%, while maintaining a full-graph accuracy.
arXiv Detail & Related papers (2022-03-21T13:44:37Z) - PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks
with Pipelined Feature Communication [24.05916878277873]
Graph Convolutional Networks (GCNs) is the state-of-the-art method for learning graph-structured data.
distributed GCN training incurs prohibitive overhead of communicating node features and feature gradients among partitions.
We propose PipeGCN, a scheme that hides the communication overhead by pipelining inter- partition communication.
arXiv Detail & Related papers (2022-03-20T02:08:03Z) - GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm
and Accelerator Co-Design [27.311994997480745]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model.
It can be notoriously challenging to inference GCNs over large graph datasets.
This paper proposes a GCN algorithm and accelerator Co-Design framework dubbed GCoD which can largely alleviate the aforementioned GCN irregularity.
arXiv Detail & Related papers (2021-12-22T00:30:50Z) - Community-based Layerwise Distributed Training of Graph Convolutional
Networks [18.96786634170954]
We propose a parallel and distributed GCN training algorithm based on the Alternating Direction Method of Multipliers (ADMM)
Preliminary results demonstrate that our proposed community-based ADMM training algorithm can lead to more than triple speedup.
arXiv Detail & Related papers (2021-12-17T05:50:08Z) - GCNear: A Hybrid Architecture for Efficient GCN Training with
Near-Memory Processing [8.130391367247793]
Graph Convolutional Networks (GCNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data.
It is challenging to realize efficient GCN training, especially on large graphs.
This paper presents GCNear, a hybrid architecture to tackle these challenges.
arXiv Detail & Related papers (2021-11-01T03:47:07Z) - DeeperGCN: All You Need to Train Deeper GCNs [66.64739331859226]
Graph Convolutional Networks (GCNs) have been drawing significant attention with the power of representation learning on graphs.
Unlike Convolutional Neural Networks (CNNs), which are able to take advantage of stacking very deep layers, GCNs suffer from vanishing gradient, over-smoothing and over-fitting issues when going deeper.
This paper proposes DeeperGCN that is capable of successfully and reliably training very deep GCNs.
arXiv Detail & Related papers (2020-06-13T23:00:22Z) - L$^2$-GCN: Layer-Wise and Learned Efficient Training of Graph
Convolutional Networks [118.37805042816784]
Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets.
We propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training.
Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size.
arXiv Detail & Related papers (2020-03-30T16:37:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.