A Quantitative Survey of Communication Optimizations in Distributed Deep
Learning
- URL: http://arxiv.org/abs/2005.13247v2
- Date: Sat, 7 Nov 2020 07:05:33 GMT
- Title: A Quantitative Survey of Communication Optimizations in Distributed Deep
Learning
- Authors: Shaohuai Shi, Zhenheng Tang, Xiaowen Chu, Chengjian Liu, Wei Wang, Bo
Li
- Abstract summary: Large and complex deep learning (DL) models are increasingly trained in a distributed manner across multiple worker machines.
Extensive communications between workers pose serious scaling problems.
We present a quantitative survey of communication optimization techniques for data parallel distributed DL.
- Score: 19.514207840069616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, large and complex deep learning (DL) models are increasingly
trained in a distributed manner across multiple worker machines, in which
extensive communications between workers pose serious scaling problems. In this
article, we present a quantitative survey of communication optimization
techniques for data parallel distributed DL. We first identify the major
communication challenges and classify the existing solutions into three levels,
namely the learning algorithm, the system architecture, and the network
infrastructure. We present the state-of-the-art communication optimization
techniques and conduct a comparative study of seven common lossless distributed
DL methods on a 32-GPU cluster with 100Gbps InfiniBand (IB). We show that (1)
the DL models with low model intensity (such as BERT and BERT-Large) are
difficult to scale out even with the best available lossless algorithm over
100Gbps IB; (2) the system architecture and scheduling algorithms have a
critical impact on the scaling property. We conclude the article with
discussions on the open issues for further investigations.
Related papers
- Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales [13.846014191157405]
We introduce a novel approach that leverages one-bit algorithm unrolling, effectively integrating information from the physical world in the model architecture.
Our method achieves a bit-per-link rate significantly lower than the 1.58 bits reported in prior work.
We demonstrate that the proposed one-bit algorithm unrolling scheme can improve both training and test outcomes.
arXiv Detail & Related papers (2025-02-04T00:53:10Z) - Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate.
Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Overlay-based Decentralized Federated Learning in Bandwidth-limited Networks [3.9162099309900835]
Decentralized federated learning (DFL) has the promise of boosting the deployment of artificial intelligence (AI) by directly learning across distributed agents without centralized coordination.
Most existing solutions were based on the simplistic assumption that neighboring agents are physically adjacent in the underlying communication network.
We jointly design the communication demands and the communication schedule for overlay-based DFL in bandwidth-limited networks without requiring explicit cooperation from the underlying network.
arXiv Detail & Related papers (2024-08-08T18:05:11Z) - Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey [43.57122822150023]
This article surveys the literature on algorithms and technologies aimed at achieving efficient communication in large-scale distributed deep learning.
We first introduce efficient algorithms for model synchronization and communication data compression in the context of large-scale distributed training.
Next, we introduce efficient strategies related to resource allocation and task scheduling for use in distributed training and inference.
arXiv Detail & Related papers (2024-04-09T08:35:04Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Multi-agent Communication with Graph Information Bottleneck under
Limited Bandwidth (a position paper) [92.11330289225981]
In many real-world scenarios, communication can be expensive and the bandwidth of the multi-agent system is subject to certain constraints.
Redundant messages who occupy the communication resources can block the transmission of informative messages and thus jeopardize the performance.
We propose a novel multi-agent communication module, CommGIB, which effectively compresses the structure information and node information in the communication graph to deal with bandwidth-constrained settings.
arXiv Detail & Related papers (2021-12-20T07:53:44Z) - Federated Learning over Wireless IoT Networks with Optimized
Communication and Resources [98.18365881575805]
Federated learning (FL) as a paradigm of collaborative learning techniques has obtained increasing research attention.
It is of interest to investigate fast responding and accurate FL schemes over wireless systems.
We show that the proposed communication-efficient federated learning framework converges at a strong linear rate.
arXiv Detail & Related papers (2021-10-22T13:25:57Z) - Communication-Efficient Distributed Deep Learning: A Comprehensive
Survey [22.42450750097714]
We provide a comprehensive survey of the communication-efficient distributed training algorithms.
We first propose a taxonomy of data-parallel distributed training algorithms.
We then investigate state-of-the-art studies that address problems in these four dimensions.
arXiv Detail & Related papers (2020-03-10T05:42:44Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.