Exploring the Impact of Serverless Computing on Peer To Peer Training
Machine Learning
- URL: http://arxiv.org/abs/2309.14139v1
- Date: Mon, 25 Sep 2023 13:51:07 GMT
- Title: Exploring the Impact of Serverless Computing on Peer To Peer Training
Machine Learning
- Authors: Amine Barral, Ranim Trabelsi, Fehmi Jaafar, Fabio Petrillo
- Abstract summary: We introduce a novel architecture that combines serverless computing with P2P networks for distributed training.
Our findings show a significant enhancement in computation time, with up to a 97.34% improvement compared to conventional P2P distributed training methods.
Despite the cost-time trade-off, the serverless approach still holds promise due to its pay-as-you-go model.
- Score: 0.3441021278275805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing demand for computational power in big data and machine
learning has driven the development of distributed training methodologies.
Among these, peer-to-peer (P2P) networks provide advantages such as enhanced
scalability and fault tolerance. However, they also encounter challenges
related to resource consumption, costs, and communication overhead as the
number of participating peers grows. In this paper, we introduce a novel
architecture that combines serverless computing with P2P networks for
distributed training and present a method for efficient parallel gradient
computation under resource constraints.
Our findings show a significant enhancement in gradient computation time,
with up to a 97.34\% improvement compared to conventional P2P distributed
training methods. As for costs, our examination confirmed that the serverless
architecture could incur higher expenses, reaching up to 5.4 times more than
instance-based architectures. It is essential to consider that these higher
costs are associated with marked improvements in computation time, particularly
under resource-constrained scenarios. Despite the cost-time trade-off, the
serverless approach still holds promise due to its pay-as-you-go model.
Utilizing dynamic resource allocation, it enables faster training times and
optimized resource utilization, making it a promising candidate for a wide
range of machine learning applications.
Related papers
- SFPrompt: Communication-Efficient Split Federated Fine-Tuning for Large Pre-Trained Models over Resource-Limited Devices [10.10998320880871]
SFPrompt is a privacy-preserving fine-tuning method tailored for the federated setting.
It combines split learning with federated learning to handle these challenges.
Experiments demonstrate that SFPrompt delivers competitive performance as the federated full fine-tuning approach.
arXiv Detail & Related papers (2024-07-24T04:22:37Z) - Scalable Federated Unlearning via Isolated and Coded Sharding [76.12847512410767]
Federated unlearning has emerged as a promising paradigm to erase the client-level data effect.
This paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing.
arXiv Detail & Related papers (2024-01-29T08:41:45Z) - A Review of Deep Reinforcement Learning in Serverless Computing:
Function Scheduling and Resource Auto-Scaling [2.0722667822370386]
This paper presents a comprehensive review of the application of Deep Reinforcement Learning (DRL) techniques in serverless computing.
A systematic review of recent studies applying DRL to serverless computing is presented, covering various algorithms, models, and performances.
Our analysis reveals that DRL, with its ability to learn and adapt from an environment, shows promising results in improving the efficiency of function scheduling and resource scaling.
arXiv Detail & Related papers (2023-10-05T09:26:04Z) - Architecting Peer-to-Peer Serverless Distributed Machine Learning
Training for Improved Fault Tolerance [1.495380389108477]
Serverless computing is a new paradigm for cloud computing that uses functions as a computational unit.
By distributing the workload, distributed machine learning can speed up the training process and allow more complex models to be trained.
We propose exploring the use of serverless computing in distributed machine learning training and comparing the performance of P2P architecture with the parameter server architecture.
arXiv Detail & Related papers (2023-02-27T17:38:47Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - Actively Learning Costly Reward Functions for Reinforcement Learning [56.34005280792013]
We show that it is possible to train agents in complex real-world environments orders of magnitudes faster.
By enabling the application of reinforcement learning methods to new domains, we show that we can find interesting and non-trivial solutions.
arXiv Detail & Related papers (2022-11-23T19:17:20Z) - Asynchronous Parallel Incremental Block-Coordinate Descent for
Decentralized Machine Learning [55.198301429316125]
Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing.
For fast-increasing applications and data amounts, distributed learning is a promising emerging paradigm since it is often impractical or inefficient to share/aggregate data.
This paper studies the problem of training an ML model over decentralized systems, where data are distributed over many user devices.
arXiv Detail & Related papers (2022-02-07T15:04:15Z) - Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices.
We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time.
Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z) - HeterPS: Distributed Deep Learning With Reinforcement Learning Based
Scheduling in Heterogeneous Environments [37.55572042288321]
Training process of neural networks (DNNs) generally handles large-scale input data with many sparse features.
Paddle-HeterPS is composed of a distributed architecture and a Reinforcement Reinforcement (RL)-based scheduling method.
We show that Paddle-HeterPS significantly outperforms state-of-the-art approaches in terms of throughput (14.5 times higher) and monetary cost (312.3% smaller)
arXiv Detail & Related papers (2021-11-20T17:09:15Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.