Multi-Job Intelligent Scheduling with Cross-Device Federated Learning
- URL: http://arxiv.org/abs/2211.13430v1
- Date: Thu, 24 Nov 2022 06:17:40 GMT
- Title: Multi-Job Intelligent Scheduling with Cross-Device Federated Learning
- Authors: Ji Liu, Juncheng Jia, Beichen Ma, Chendi Zhou, Jingbo Zhou, Yang Zhou,
Huaiyu Dai, Dejing Dou
- Abstract summary: Federated Learning (FL) enables collaborative global machine learning model training without sharing sensitive raw data.
We propose a novel multi-job FL framework, which enables the training process of multiple jobs in parallel.
We propose a novel intelligent scheduling approach based on multiple scheduling methods, including an original reinforcement learning-based scheduling method and an original Bayesian optimization-based scheduling method.
- Score: 65.69079337653994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have witnessed a large amount of decentralized data in various
(edge) devices of end-users, while the decentralized data aggregation remains
complicated for machine learning jobs because of regulations and laws. As a
practical approach to handling decentralized data, Federated Learning (FL)
enables collaborative global machine learning model training without sharing
sensitive raw data. The servers schedule devices to jobs within the training
process of FL. In contrast, device scheduling with multiple jobs in FL remains
a critical and open problem. In this paper, we propose a novel multi-job FL
framework, which enables the training process of multiple jobs in parallel. The
multi-job FL framework is composed of a system model and a scheduling method.
The system model enables a parallel training process of multiple jobs, with a
cost model based on the data fairness and the training time of diverse devices
during the parallel training process. We propose a novel intelligent scheduling
approach based on multiple scheduling methods, including an original
reinforcement learning-based scheduling method and an original Bayesian
optimization-based scheduling method, which corresponds to a small cost while
scheduling devices to multiple jobs. We conduct extensive experimentation with
diverse jobs and datasets. The experimental results reveal that our proposed
approaches significantly outperform baseline approaches in terms of training
time (up to 12.73 times faster) and accuracy (up to 46.4% higher).
Related papers
- FedAST: Federated Asynchronous Simultaneous Training [27.492821176616815]
Federated Learning (FL) enables devices or clients to collaboratively train machine learning (ML) models without sharing their private data.
Much of the existing work in FL focuses on efficiently learning a model for a single task.
In this paper, we propose simultaneous training of multiple FL models using a common set of datasets.
arXiv Detail & Related papers (2024-06-01T05:14:20Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - Asynchronous Multi-Model Dynamic Federated Learning over Wireless
Networks: Theory, Modeling, and Optimization [20.741776617129208]
Federated learning (FL) has emerged as a key technique for distributed machine learning (ML)
We first formulate rectangular scheduling steps and functions to capture the impact of system parameters on learning performance.
Our analysis sheds light on the joint impact of device training variables and asynchronous scheduling decisions.
arXiv Detail & Related papers (2023-05-22T21:39:38Z) - Scheduling and Aggregation Design for Asynchronous Federated Learning
over Wireless Networks [56.91063444859008]
Federated Learning (FL) is a collaborative machine learning framework that combines on-device training and server-based aggregation.
We propose an asynchronous FL design with periodic aggregation to tackle the straggler issue in FL systems.
We show that an age-aware'' aggregation weighting design can significantly improve the learning performance in an asynchronous FL setting.
arXiv Detail & Related papers (2022-12-14T17:33:01Z) - Decentralized Training of Foundation Models in Heterogeneous
Environments [77.47261769795992]
Training foundation models, such as GPT-3 and PaLM, can be extremely expensive.
We present the first study of training large foundation models with model parallelism in a decentralized regime over a heterogeneous network.
arXiv Detail & Related papers (2022-06-02T20:19:51Z) - Efficient Device Scheduling with Multi-Job Federated Learning [64.21733164243781]
We propose a novel multi-job Federated Learning framework to enable the parallel training process of multiple jobs.
We propose a reinforcement learning-based method and a Bayesian optimization-based method to schedule devices for multiple jobs while minimizing the cost.
Our proposed approaches significantly outperform baseline approaches in terms of training time (up to 8.67 times faster) and accuracy (up to 44.6% higher)
arXiv Detail & Related papers (2021-12-11T08:05:11Z) - Device Scheduling and Update Aggregation Policies for Asynchronous
Federated Learning [72.78668894576515]
Federated Learning (FL) is a newly emerged decentralized machine learning (ML) framework.
We propose an asynchronous FL framework with periodic aggregation to eliminate the straggler issue in FL systems.
arXiv Detail & Related papers (2021-07-23T18:57:08Z) - Scheduling Policy and Power Allocation for Federated Learning in NOMA
Based MEC [21.267954799102874]
Federated learning (FL) is a highly pursued machine learning technique that can train a model centrally while keeping data distributed.
We propose a new scheduling policy and power allocation scheme using non-orthogonal multiple access (NOMA) settings to maximize the weighted sum data rate.
Simulation results show that the proposed scheduling and power allocation scheme can help achieve a higher FL testing accuracy in NOMA based wireless networks.
arXiv Detail & Related papers (2020-06-21T23:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.