AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel
Training
- URL: http://arxiv.org/abs/2311.05827v1
- Date: Fri, 10 Nov 2023 02:18:33 GMT
- Title: AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel
Training
- Authors: Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Zhiguo
Shi, Jiming Chen
- Abstract summary: We propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training.
In particular, we propose a light-weight adaptive latency predictor to accurately estimate the latency of each layer at different devices.
Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster.
- Score: 22.107070114339038
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: It is usually infeasible to fit and train an entire large deep neural network
(DNN) model using a single edge device due to the limited resources. To
facilitate intelligent applications across edge devices, researchers have
proposed partitioning a large model into several sub-models, and deploying each
of them to a different edge device to collaboratively train a DNN model.
However, the communication overhead caused by the large amount of data
transmitted from one device to another during training, as well as the
sub-optimal partition point due to the inaccurate latency prediction of
computation at each edge device can significantly slow down training. In this
paper, we propose AccEPT, an acceleration scheme for accelerating the edge
collaborative pipeline-parallel training. In particular, we propose a
light-weight adaptive latency predictor to accurately estimate the computation
latency of each layer at different devices, which also adapts to unseen devices
through continuous learning. Therefore, the proposed latency predictor leads to
better model partitioning which balances the computation loads across
participating devices. Moreover, we propose a bit-level computation-efficient
data compression scheme to compress the data to be transmitted between devices
during training. Our numerical results demonstrate that our proposed
acceleration approach is able to significantly speed up edge pipeline parallel
training up to 3 times faster in the considered experimental settings.
Related papers
- Edge-Enabled Real-time Railway Track Segmentation [0.0]
We propose an edge-enabled real-time railway track segmentation algorithm.
It is optimized to be suitable for edge applications by optimizing the network structure and quantizing the model after training.
Experimental results demonstrate that our enhanced algorithm achieves an accuracy level of 83.3%.
arXiv Detail & Related papers (2024-01-21T13:45:52Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - Design and Prototyping Distributed CNN Inference Acceleration in Edge
Computing [85.74517957717363]
HALP accelerates inference by designing a seamless collaboration among edge devices (EDs) in Edge Computing.
Experiments show that the distributed inference HALP achieves 1.7x inference acceleration for VGG-16.
It is shown that the model selection with distributed inference HALP can significantly improve service reliability.
arXiv Detail & Related papers (2022-11-24T19:48:30Z) - Efficient Graph Neural Network Inference at Large Scale [54.89457550773165]
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications.
Existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure.
We propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information.
arXiv Detail & Related papers (2022-11-01T14:38:18Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Fast-Convergent Federated Learning [82.32029953209542]
Federated learning is a promising solution for distributing machine learning tasks through modern networks of mobile devices.
We propose a fast-convergent federated learning algorithm, called FOLB, which performs intelligent sampling of devices in each round of model training.
arXiv Detail & Related papers (2020-07-26T14:37:51Z) - Joint Device Scheduling and Resource Allocation for Latency Constrained
Wireless Federated Learning [26.813145949399427]
In federated learning (FL), devices upload their local model updates via wireless channels.
We propose a joint device scheduling and resource allocation policy to maximize the model accuracy.
Experiments show that the proposed policy outperforms state-of-the-art scheduling policies.
arXiv Detail & Related papers (2020-07-14T16:46:47Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.