Related papers: A Quantitative Survey of Communication Optimizations in Distributed Deep Learning

A Quantitative Survey of Communication Optimizations in Distributed Deep Learning

URL: http://arxiv.org/abs/2005.13247v2
Date: Sat, 7 Nov 2020 07:05:33 GMT
Title: A Quantitative Survey of Communication Optimizations in Distributed Deep Learning
Authors: Shaohuai Shi, Zhenheng Tang, Xiaowen Chu, Chengjian Liu, Wei Wang, Bo Li
Abstract summary: Large and complex deep learning (DL) models are increasingly trained in a distributed manner across multiple worker machines. Extensive communications between workers pose serious scaling problems. We present a quantitative survey of communication optimization techniques for data parallel distributed DL.
Score: 19.514207840069616
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Nowadays, large and complex deep learning (DL) models are increasingly trained in a distributed manner across multiple worker machines, in which extensive communications between workers pose serious scaling problems. In this article, we present a quantitative survey of communication optimization techniques for data parallel distributed DL. We first identify the major communication challenges and classify the existing solutions into three levels, namely the learning algorithm, the system architecture, and the network infrastructure. We present the state-of-the-art communication optimization techniques and conduct a comparative study of seven common lossless distributed DL methods on a 32-GPU cluster with 100Gbps InfiniBand (IB). We show that (1) the DL models with low model intensity (such as BERT and BERT-Large) are difficult to scale out even with the best available lossless algorithm over 100Gbps IB; (2) the system architecture and scheduling algorithms have a critical impact on the scaling property. We conclude the article with discussions on the open issues for further investigations.

Related papers

Communication Optimization for Decentralized Learning atop Bandwidth-limited Edge Networks [4.880664732766839]
Decentralized federated learning (DFL) is a promising machine learning paradigm for bringing artificial intelligence (AI) capabilities to the network edge. Running DFL on top of edge networks, however, faces severe performance challenges due to the extensive parameter exchanges between agents. We jointly design the communication scheme for the overlay network formed by the agents and the mixing matrix that controls the communication demands between the agents. Our evaluations show that the proposed algorithm can reduce the total training time by over $80%$ compared to the baseline.
arXiv Detail & Related papers (2025-04-16T15:56:57Z)
Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales [13.846014191157405]
We introduce a novel approach that leverages one-bit algorithm unrolling, effectively integrating information from the physical world in the model architecture. Our method achieves a bit-per-link rate significantly lower than the 1.58 bits reported in prior work. We demonstrate that the proposed one-bit algorithm unrolling scheme can improve both training and test outcomes.
arXiv Detail & Related papers (2025-02-04T00:53:10Z)
Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate. Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Rephrase and Contrast: Fine-Tuning Language Models for Enhanced Understanding of Communication and Computer Networks [13.829525575305206]
This paper introduces our Rephrase and Contrast (RaC) framework, an efficient fine-tuning framework. RaC enhances LLMs' comprehension and critical thinking abilities by incorporating question reformulation and contrastive analysis. To efficiently construct the dataset for RaC fine-tuning, we develop a GPT-assisted data mining method for generating high-quality question-answer pairs.
arXiv Detail & Related papers (2024-09-21T16:04:43Z)
Overlay-based Decentralized Federated Learning in Bandwidth-limited Networks [3.9162099309900835]
Decentralized federated learning (DFL) has the promise of boosting the deployment of artificial intelligence (AI) by directly learning across distributed agents without centralized coordination. Most existing solutions were based on the simplistic assumption that neighboring agents are physically adjacent in the underlying communication network. We jointly design the communication demands and the communication schedule for overlay-based DFL in bandwidth-limited networks without requiring explicit cooperation from the underlying network.
arXiv Detail & Related papers (2024-08-08T18:05:11Z)
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey [43.57122822150023]
This article surveys the literature on algorithms and technologies aimed at achieving efficient communication in large-scale distributed deep learning. We first introduce efficient algorithms for model synchronization and communication data compression in the context of large-scale distributed training. Next, we introduce efficient strategies related to resource allocation and task scheduling for use in distributed training and inference.
arXiv Detail & Related papers (2024-04-09T08:35:04Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Multi-agent Communication with Graph Information Bottleneck under Limited Bandwidth (a position paper) [92.11330289225981]
In many real-world scenarios, communication can be expensive and the bandwidth of the multi-agent system is subject to certain constraints. Redundant messages who occupy the communication resources can block the transmission of informative messages and thus jeopardize the performance. We propose a novel multi-agent communication module, CommGIB, which effectively compresses the structure information and node information in the communication graph to deal with bandwidth-constrained settings.
arXiv Detail & Related papers (2021-12-20T07:53:44Z)
Federated Learning over Wireless IoT Networks with Optimized Communication and Resources [98.18365881575805]
Federated learning (FL) as a paradigm of collaborative learning techniques has obtained increasing research attention. It is of interest to investigate fast responding and accurate FL schemes over wireless systems. We show that the proposed communication-efficient federated learning framework converges at a strong linear rate.
arXiv Detail & Related papers (2021-10-22T13:25:57Z)
A Tutorial on Ultra-Reliable and Low-Latency Communications in 6G: Integrating Domain Knowledge into Deep Learning [115.75967665222635]
Ultra-reliable and low-latency communications (URLLC) will be central for the development of various emerging mission-critical applications. Deep learning algorithms have been considered as promising ways of developing enabling technologies for URLLC in future 6G networks. This tutorial illustrates how domain knowledge can be integrated into different kinds of deep learning algorithms for URLLC.
arXiv Detail & Related papers (2020-09-13T14:53:01Z)
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey [22.42450750097714]
We provide a comprehensive survey of the communication-efficient distributed training algorithms. We first propose a taxonomy of data-parallel distributed training algorithms. We then investigate state-of-the-art studies that address problems in these four dimensions.
arXiv Detail & Related papers (2020-03-10T05:42:44Z)
Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.