Related papers: Expediting Distributed DNN Training with Device Topology-Aware Graph Deployment

Expediting Distributed DNN Training with Device Topology-Aware Graph Deployment

URL: http://arxiv.org/abs/2302.06126v1
Date: Mon, 13 Feb 2023 06:30:24 GMT
Title: Expediting Distributed DNN Training with Device Topology-Aware Graph Deployment
Authors: Shiwei Zhang, Xiaodong Yi, Lansong Diao, Chuan Wu, Siyu Wang, and Wei Lin
Abstract summary: TAG is an automatic system to derive optimized DNN training graph and its deployment onto any device topology. We show that it can achieve up to 4.56x training speed-up as compared to existing schemes.
Score: 18.021259939659874
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents TAG, an automatic system to derive optimized DNN training graph and its deployment onto any device topology, for expedited training in device- and topology- heterogeneous ML clusters. We novelly combine both the DNN computation graph and the device topology graph as input to a graph neural network (GNN), and join the GNN with a search-based method to quickly identify optimized distributed training strategies. To reduce communication in a heterogeneous cluster, we further explore a lossless gradient compression technique and solve a combinatorial optimization problem to automatically apply the technique for training time minimization. We evaluate TAG with various representative DNN models and device topologies, showing that it can achieve up to 4.56x training speed-up as compared to existing schemes. TAG can produce efficient deployment strategies for both unseen DNN models and unseen device topologies, without heavy fine-tuning.

Related papers

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Graph Coordinates and Conventional Neural Networks -- An Alternative for Graph Neural Networks [0.10923877073891444]
We propose Topology Coordinate Neural Network (TCNN) and Directional Virtual Coordinate Neural Network (DVCNN) as novel alternatives to message passing GNNs. TCNN and DVCNN achieve competitive or superior performance to message passing GNNs. Our work expands the toolbox of techniques for graph-based machine learning.
arXiv Detail & Related papers (2023-12-03T10:14:10Z)
Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z)
T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining. Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z)
Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z)
Learning Cooperative Beamforming with Edge-Update Empowered Graph Neural Networks [29.23937571816269]
We propose an edge-graph-neural-network (Edge-GNN) to learn the cooperative beamforming on the graph edges. The proposed Edge-GNN achieves higher sum rate with much shorter computation time than state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-23T02:05:06Z)
Distributed Graph Neural Network Training: A Survey [51.77035975191926]
Graph neural networks (GNNs) are a type of deep learning models that are trained on graphs and have been successfully applied in various domains. Despite the effectiveness of GNNs, it is still challenging for GNNs to efficiently scale to large graphs. As a remedy, distributed computing becomes a promising solution of training large-scale GNNs.
arXiv Detail & Related papers (2022-11-01T01:57:00Z)
GNN at the Edge: Cost-Efficient Graph Neural Network Processing over Distributed Edge Servers [24.109721494781592]
Graph Neural Networks (GNNs) are still under exploration, presenting a stark disparity to its broad edge adoptions. This paper studies the cost optimization for distributed GNN processing over a multi-tier heterogeneous edge network. We show that our approach achieves superior performance over de facto baselines with more than 95.8% cost eduction in a fast convergence speed.
arXiv Detail & Related papers (2022-10-31T13:03:16Z)
Distributed Graph Neural Network Training with Periodic Historical Embedding Synchronization [9.503080586294406]
Graph Neural Networks (GNNs) are prevalent in various applications such as social network, recommender systems, and knowledge graphs. Traditional sampling-based methods accelerate GNN by dropping edges and nodes, which impairs the graph integrity and model performance. This paper proposes DIstributed Graph Embedding SynchronizaTion (DIGEST), a novel distributed GNN training framework.
arXiv Detail & Related papers (2022-05-31T18:44:53Z)
Fast Learning of Graph Neural Networks with Guaranteed Generalizability: One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice. We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.