Efficient Device Scheduling with Multi-Job Federated Learning
- URL: http://arxiv.org/abs/2112.05928v2
- Date: Wed, 15 Dec 2021 11:40:35 GMT
- Title: Efficient Device Scheduling with Multi-Job Federated Learning
- Authors: Chendi Zhou, Ji Liu, Juncheng Jia, Jingbo Zhou, Yang Zhou, Huaiyu Dai,
Dejing Dou
- Abstract summary: We propose a novel multi-job Federated Learning framework to enable the parallel training process of multiple jobs.
We propose a reinforcement learning-based method and a Bayesian optimization-based method to schedule devices for multiple jobs while minimizing the cost.
Our proposed approaches significantly outperform baseline approaches in terms of training time (up to 8.67 times faster) and accuracy (up to 44.6% higher)
- Score: 64.21733164243781
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have witnessed a large amount of decentralized data in multiple
(edge) devices of end-users, while the aggregation of the decentralized data
remains difficult for machine learning jobs due to laws or regulations.
Federated Learning (FL) emerges as an effective approach to handling
decentralized data without sharing the sensitive raw data, while
collaboratively training global machine learning models. The servers in FL need
to select (and schedule) devices during the training process. However, the
scheduling of devices for multiple jobs with FL remains a critical and open
problem. In this paper, we propose a novel multi-job FL framework to enable the
parallel training process of multiple jobs. The framework consists of a system
model and two scheduling methods. In the system model, we propose a parallel
training process of multiple jobs, and construct a cost model based on the
training time and the data fairness of various devices during the training
process of diverse jobs. We propose a reinforcement learning-based method and a
Bayesian optimization-based method to schedule devices for multiple jobs while
minimizing the cost. We conduct extensive experimentation with multiple jobs
and datasets. The experimental results show that our proposed approaches
significantly outperform baseline approaches in terms of training time (up to
8.67 times faster) and accuracy (up to 44.6% higher).
Related papers
- FedAST: Federated Asynchronous Simultaneous Training [27.492821176616815]
Federated Learning (FL) enables devices or clients to collaboratively train machine learning (ML) models without sharing their private data.
Much of the existing work in FL focuses on efficiently learning a model for a single task.
In this paper, we propose simultaneous training of multiple FL models using a common set of datasets.
arXiv Detail & Related papers (2024-06-01T05:14:20Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - Speed Up Federated Learning in Heterogeneous Environment: A Dynamic
Tiering Approach [5.504000607257414]
Federated learning (FL) enables collaboratively training a model while keeping the training data decentralized and private.
One significant impediment to training a model using FL, especially large models, is the resource constraints of devices with heterogeneous computation and communication capacities as well as varying task sizes.
We propose the Dynamic Tiering-based Federated Learning (DTFL) system where slower clients dynamically offload part of the model to the server to alleviate resource constraints and speed up training.
arXiv Detail & Related papers (2023-12-09T19:09:19Z) - Multi-Job Intelligent Scheduling with Cross-Device Federated Learning [65.69079337653994]
Federated Learning (FL) enables collaborative global machine learning model training without sharing sensitive raw data.
We propose a novel multi-job FL framework, which enables the training process of multiple jobs in parallel.
We propose a novel intelligent scheduling approach based on multiple scheduling methods, including an original reinforcement learning-based scheduling method and an original Bayesian optimization-based scheduling method.
arXiv Detail & Related papers (2022-11-24T06:17:40Z) - Federated Learning and Meta Learning: Approaches, Applications, and
Directions [94.68423258028285]
In this tutorial, we present a comprehensive review of FL, meta learning, and federated meta learning (FedMeta)
Unlike other tutorial papers, our objective is to explore how FL, meta learning, and FedMeta methodologies can be designed, optimized, and evolved, and their applications over wireless networks.
arXiv Detail & Related papers (2022-10-24T10:59:29Z) - Decentralized Training of Foundation Models in Heterogeneous
Environments [77.47261769795992]
Training foundation models, such as GPT-3 and PaLM, can be extremely expensive.
We present the first study of training large foundation models with model parallelism in a decentralized regime over a heterogeneous network.
arXiv Detail & Related papers (2022-06-02T20:19:51Z) - Motivating Learners in Multi-Orchestrator Mobile Edge Learning: A
Stackelberg Game Approach [54.28419430315478]
Mobile Edge Learning enables distributed training of Machine Learning models over heterogeneous edge devices.
In MEL, the training performance deteriorates without the availability of sufficient training data or computing resources.
We propose an incentive mechanism, where we formulate the orchestrators-learners interactions as a 2-round Stackelberg game.
arXiv Detail & Related papers (2021-09-25T17:27:48Z) - Scheduling Policy and Power Allocation for Federated Learning in NOMA
Based MEC [21.267954799102874]
Federated learning (FL) is a highly pursued machine learning technique that can train a model centrally while keeping data distributed.
We propose a new scheduling policy and power allocation scheme using non-orthogonal multiple access (NOMA) settings to maximize the weighted sum data rate.
Simulation results show that the proposed scheduling and power allocation scheme can help achieve a higher FL testing accuracy in NOMA based wireless networks.
arXiv Detail & Related papers (2020-06-21T23:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.