DynaComm: Accelerating Distributed CNN Training between Edges and Clouds
through Dynamic Communication Scheduling
- URL: http://arxiv.org/abs/2101.07968v1
- Date: Wed, 20 Jan 2021 05:09:41 GMT
- Title: DynaComm: Accelerating Distributed CNN Training between Edges and Clouds
through Dynamic Communication Scheduling
- Authors: Shangming Cai, Dongsheng Wang, Haixia Wang, Yongqiang Lyu, Guangquan
Xu, Xi Zheng and Athanasios V. Vasilakos
- Abstract summary: We present DynaComm, a novel scheduler that decomposes each transmission procedure into several segments to achieve optimal communications and computations overlapping during run-time.
We verify that DynaComm manages to achieve optimal scheduling for all cases compared to competing strategies while the model accuracy remains untouched.
- Score: 11.34309642431225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To reduce uploading bandwidth and address privacy concerns, deep learning at
the network edge has been an emerging topic. Typically, edge devices
collaboratively train a shared model using real-time generated data through the
Parameter Server framework. Although all the edge devices can share the
computing workloads, the distributed training processes over edge networks are
still time-consuming due to the parameters and gradients transmission
procedures between parameter servers and edge devices. Focusing on accelerating
distributed Convolutional Neural Networks (CNNs) training at the network edge,
we present DynaComm, a novel scheduler that dynamically decomposes each
transmission procedure into several segments to achieve optimal communications
and computations overlapping during run-time. Through experiments, we verify
that DynaComm manages to achieve optimal scheduling for all cases compared to
competing strategies while the model accuracy remains untouched.
Related papers
- DIET: Customized Slimming for Incompatible Networks in Sequential Recommendation [16.44627200990594]
recommender systems start to deploy models on edges to alleviate network congestion caused by frequent mobile requests.
Several studies have leveraged the proximity of edge-side to real-time data, fine-tuning them to create edge-specific models.
These methods require substantial on-edge computational resources and frequent network transfers to keep the model up to date.
We propose a customizeD slImming framework for incompatiblE neTworks(DIET). DIET deploys the same generic backbone (potentially incompatible for a specific edge) to all devices.
arXiv Detail & Related papers (2024-06-13T04:39:16Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training.
We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Receptive Field-based Segmentation for Distributed CNN Inference
Acceleration in Collaborative Edge Computing [93.67044879636093]
We study inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network.
We propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers.
arXiv Detail & Related papers (2022-07-22T18:38:11Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - Layer-Parallel Training of Residual Networks with Auxiliary-Variable
Networks [28.775355111614484]
auxiliary-variable methods have attracted much interest lately but suffer from significant communication overhead and lack of data augmentation.
We present a novel joint learning framework for training realistic ResNets across multiple compute devices.
We demonstrate the effectiveness of our methods on ResNets and WideResNets across CIFAR-10, CIFAR-100, and ImageNet datasets.
arXiv Detail & Related papers (2021-12-10T08:45:35Z) - To Talk or to Work: Delay Efficient Federated Learning over Mobile Edge
Devices [13.318419040823088]
Mobile devices collaborate to train a model based on their own data under the coordination of a central server.
Without the central availability of data, computing nodes need to communicate the model updates often to attain convergence.
We propose a delay-efficient FL mechanism that reduces the overall time (consisting of both the computation and communication latencies) and communication rounds required for the model to converge.
arXiv Detail & Related papers (2021-11-01T00:35:32Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Accelerating Neural Network Training with Distributed Asynchronous and
Selective Optimization (DASO) [0.0]
We introduce the Distributed Asynchronous and Selective Optimization (DASO) method to accelerate network training.
DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks.
We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks.
arXiv Detail & Related papers (2021-04-12T16:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.