GraNNDis: Efficient Unified Distributed Training Framework for Deep GNNs
on Large Clusters
- URL: http://arxiv.org/abs/2311.06837v1
- Date: Sun, 12 Nov 2023 13:30:31 GMT
- Title: GraNNDis: Efficient Unified Distributed Training Framework for Deep GNNs
on Large Clusters
- Authors: Jaeyong Song, Hongsun Jang, Jaewon Jung, Youngsok Kim, Jinho Lee
- Abstract summary: Graph neural networks (GNNs) are one of the most rapidly growing fields within deep learning.
GraNNDis is an efficient distributed GNN training framework for training GNNs on large graphs and deep layers.
GraNNDis provides superior speedup over the state-of-the-art distributed GNN training frameworks.
- Score: 8.137466511979586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph neural networks (GNNs) are one of the most rapidly growing fields
within deep learning. According to the growth in the dataset and the model size
used for GNNs, an important problem is that it becomes nearly impossible to
keep the whole network on GPU memory. Among numerous attempts, distributed
training is one popular approach to address the problem. However, due to the
nature of GNNs, existing distributed approaches suffer from poor scalability,
mainly due to the slow external server communications.
In this paper, we propose GraNNDis, an efficient distributed GNN training
framework for training GNNs on large graphs and deep layers. GraNNDis
introduces three new techniques. First, shared preloading provides a training
structure for a cluster of multi-GPU servers. We suggest server-wise preloading
of essential vertex dependencies to reduce the low-bandwidth external server
communications. Second, we present expansion-aware sampling. Because shared
preloading alone has limitations because of the neighbor explosion,
expansion-aware sampling reduces vertex dependencies that span across server
boundaries. Third, we propose cooperative batching to create a unified
framework for full-graph and minibatch training. It significantly reduces
redundant memory usage in mini-batch training. From this, GraNNDis enables a
reasonable trade-off between full-graph and mini-batch training through
unification especially when the entire graph does not fit into the GPU memory.
With experiments conducted on a multi-server/multi-GPU cluster, we show that
GraNNDis provides superior speedup over the state-of-the-art distributed GNN
training frameworks.
Related papers
- Optimizing Federated Learning using Remote Embeddings for Graph Neural Networks [3.836669717540222]
Graph Neural Networks (GNNs) have experienced rapid advancements in recent years due to their ability to learn meaningful representations from graph data structures.<n> Federated Learning (FL) has emerged as a viable machine learning approach for training a shared model on decentralized data.<n>We present OpES, an optimized federated GNN training framework that uses remote neighbourhood pruning.
arXiv Detail & Related papers (2025-06-14T09:52:24Z) - Armada: Memory-Efficient Distributed Training of Large-Scale Graph Neural Networks [14.061451788125938]
We study distributed training of Graph Neural Networks (GNNs) on billion-scale graphs that are partitioned across machines.
Efficient training in this setting relies on min-edge-cut partitioning algorithms, which minimize cross-machine communication due to GNN neighborhood sampling.
We introduce Armada, a new end-to-end system for distributed GNN training whose key contribution is GREM, a novel min-edge-cut partitioning algorithm that can efficiently scale to large graphs.
arXiv Detail & Related papers (2025-02-25T04:47:39Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Distributed Convolutional Neural Network Training on Mobile and Edge Clusters [0.9421843976231371]
Recent efforts have emerged to localize machine learning tasks fully on the edge.
This brings advantages in reduced latency and increased privacy, but necessitates working with resource-constrained devices.
We describe an approach for distributed CNN training exclusively on mobile and edge devices.
arXiv Detail & Related papers (2024-09-11T02:44:28Z) - CDFGNN: a Systematic Design of Cache-based Distributed Full-Batch Graph Neural Network Training with Communication Reduction [7.048300785744331]
Graph neural network training is mainly categorized into mini-batch and full-batch training methods.
In the distributed cluster, frequent remote accesses of features and gradients lead to huge communication overhead.
We introduce the cached-based distributed full-batch graph neural network training framework (CDFGNN)
Our results indicate that CDFGNN has great potential in accelerating distributed full-batch GNN training tasks.
arXiv Detail & Related papers (2024-08-01T01:57:09Z) - Communication Efficient ConFederated Learning: An Event-Triggered SAGA
Approach [67.27031215756121]
Federated learning (FL) is a machine learning paradigm that targets model training without gathering the local data over various data sources.
Standard FL, which employs a single server, can only support a limited number of users, leading to degraded learning capability.
In this work, we consider a multi-server FL framework, referred to as emphConfederated Learning (CFL) in order to accommodate a larger number of users.
arXiv Detail & Related papers (2024-02-28T03:27:10Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - Timely Asynchronous Hierarchical Federated Learning: Age of Convergence [59.96266198512243]
We consider an asynchronous hierarchical federated learning setting with a client-edge-cloud framework.
The clients exchange the trained parameters with their corresponding edge servers, which update the locally aggregated model.
The goal of each client is to converge to the global model, while maintaining timeliness of the clients.
arXiv Detail & Related papers (2023-06-21T17:39:16Z) - Distributed SLIDE: Enabling Training Large Neural Networks on Low
Bandwidth and Simple CPU-Clusters via Model Parallelism and Sparsity [36.254527362066725]
This paper presents a distributed model-parallel training framework that enables training large neural networks on small CPU clusters with low Internet bandwidth.
We show that with reduced communication, due to sparsity, we can train close to a billion parameter model on simple 4-16 core CPU nodes connected by basic low bandwidth interconnect.
arXiv Detail & Related papers (2022-01-29T21:37:34Z) - Learn Locally, Correct Globally: A Distributed Algorithm for Training
Graph Neural Networks [22.728439336309858]
We propose a communication-efficient distributed GNN training technique named $textLearn Locally, Correct Globally$ (LLCG)
LLCG trains a GNN on its local data by ignoring the dependency between nodes among different machines, then sends the locally trained model to the server for periodic model averaging.
We rigorously analyze the convergence of distributed methods with periodic model averaging for training GNNs and show that naively applying periodic model averaging but ignoring the dependency between nodes will suffer from an irreducible residual error.
arXiv Detail & Related papers (2021-11-16T03:07:01Z) - SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural
Networks [13.965982814292971]
Graph Neural Networks (GNNs) are the first choice methods for graph machine learning problems.
Centralizing a massive amount of real-world graph data for GNN training is prohibitive due to user-side privacy concerns.
This work proposes SpreadGNN, a novel multi-task federated training framework.
arXiv Detail & Related papers (2021-06-04T22:20:47Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - Towards Deeper Graph Neural Networks with Differentiable Group
Normalization [61.20639338417576]
Graph neural networks (GNNs) learn the representation of a node by aggregating its neighbors.
Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases.
We introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN)
arXiv Detail & Related papers (2020-06-12T07:18:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.