Related papers: MetisFL: An Embarrassingly Parallelized Controller for Scalable & Efficient Federated Learning Workflows

MetisFL: An Embarrassingly Parallelized Controller for Scalable & Efficient Federated Learning Workflows

URL: http://arxiv.org/abs/2311.00334v2
Date: Mon, 13 Nov 2023 07:12:55 GMT
Title: MetisFL: An Embarrassingly Parallelized Controller for Scalable & Efficient Federated Learning Workflows
Authors: Dimitris Stripelis, Chrysovalantis Anastasiou, Patrick Toral, Armaghan Asghar, Jose Luis Ambite
Abstract summary: A Federated Learning (FL) system typically consists of two core processing entities: the federation controller and the learners. To meet this need, we designed and developed a novel FL system called MetisFL, where the federation controller is the first-class citizen. MetisFL re-engineers all the operations conducted by the federation controller to accelerate the training of large-scale FL.
Score: 1.9874264019909988
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A Federated Learning (FL) system typically consists of two core processing entities: the federation controller and the learners. The controller is responsible for managing the execution of FL workflows across learners and the learners for training and evaluating federated models over their private datasets. While executing an FL workflow, the FL system has no control over the computational resources or data of the participating learners. Still, it is responsible for other operations, such as model aggregation, task dispatching, and scheduling. These computationally heavy operations generally need to be handled by the federation controller. Even though many FL systems have been recently proposed to facilitate the development of FL workflows, most of these systems overlook the scalability of the controller. To meet this need, we designed and developed a novel FL system called MetisFL, where the federation controller is the first-class citizen. MetisFL re-engineers all the operations conducted by the federation controller to accelerate the training of large-scale FL workflows. By quantitatively comparing MetisFL against other state-of-the-art FL systems, we empirically demonstrate that MetisFL leads to a 10-fold wall-clock time execution boost across a wide range of challenging FL workflows with increasing model sizes and federation sites.

Related papers

A Survey on Efficient Federated Learning Methods for Foundation Model Training [62.473245910234304]
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients. In the wake of Foundation Models (FM), the reality is different for many deep learning applications. We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications.
arXiv Detail & Related papers (2024-01-09T10:22:23Z)
MAS: Towards Resource-Efficient Federated Multiple-Task Learning [29.60567693814403]
Federated learning (FL) is an emerging distributed machine learning method that empowers in-situ model training on decentralized edge devices. We propose the first FL system to effectively coordinate and train multiple simultaneous FL tasks. We present our new approach, MAS (Merge and Split), to optimize the performance of training multiple simultaneous FL tasks.
arXiv Detail & Related papers (2023-07-21T01:04:52Z)
Elastic Federated Learning over Open Radio Access Network (O-RAN) for Concurrent Execution of Multiple Distributed Learning Tasks [7.057114677579558]
Federated learning (FL) is a popular distributed machine learning (ML) technique in Internet of Things (IoT) networks. implementation of FL over 5G-and-beyond wireless networks faces key challenges caused by (i) dynamics of the wireless network conditions and (ii) the coexistence of multiple FL-services in the system. We propose a novel distributed ML architecture called elastic FL (EFL) to address these challenges.
arXiv Detail & Related papers (2023-04-14T19:21:42Z)
Automated Federated Learning in Mobile Edge Networks -- Fast Adaptation and Convergence [83.58839320635956]
Federated Learning (FL) can be used in mobile edge networks to train machine learning models in a distributed manner. Recent FL has been interpreted within a Model-Agnostic Meta-Learning (MAML) framework, which brings FL significant advantages in fast adaptation and convergence over heterogeneous datasets. This paper addresses how much benefit MAML brings to FL and how to maximize such benefit over mobile edge networks.
arXiv Detail & Related papers (2023-03-23T02:42:10Z)
Comparative Evaluation of Data Decoupling Techniques for Federated Machine Learning with Database as a Service [17.769779803790264]
Federated Learning (FL) is a machine learning approach that allows multiple clients to collaboratively learn a shared model without sharing raw data. Current FL systems provide an all-in-one solution, which can hinder the wide adoption of FL in certain domains such as scientific applications. This paper proposes a decoupling approach that enables clients to customize FL applications with specific data subsystems.
arXiv Detail & Related papers (2023-03-15T05:17:00Z)
Scheduling and Aggregation Design for Asynchronous Federated Learning over Wireless Networks [56.91063444859008]
Federated Learning (FL) is a collaborative machine learning framework that combines on-device training and server-based aggregation. We propose an asynchronous FL design with periodic aggregation to tackle the straggler issue in FL systems. We show that an age-aware'' aggregation weighting design can significantly improve the learning performance in an asynchronous FL setting.
arXiv Detail & Related papers (2022-12-14T17:33:01Z)
Federated Learning Hyper-Parameter Tuning from a System Perspective [23.516484538620745]
Federated learning (FL) is a distributed model training paradigm that preserves clients' data privacy. Current practice of manually selecting FL hyper- parameters imposes a heavy burden on FL practitioners. We propose FedTune, an automatic FL hyper- parameter tuning algorithm tailored to applications' diverse system requirements.
arXiv Detail & Related papers (2022-11-24T15:15:28Z)
Papaya: Practical, Private, and Scalable Federated Learning [6.833772874570774]
Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges. Most FL systems described in the literature are synchronous - they perform a synchronized aggregation of model updates from individual clients. In this work, we outline a production asynchronous FL system design.
arXiv Detail & Related papers (2021-11-08T23:46:42Z)
Device Scheduling and Update Aggregation Policies for Asynchronous Federated Learning [72.78668894576515]
Federated Learning (FL) is a newly emerged decentralized machine learning (ML) framework. We propose an asynchronous FL framework with periodic aggregation to eliminate the straggler issue in FL systems.
arXiv Detail & Related papers (2021-07-23T18:57:08Z)
Evaluation and Optimization of Distributed Machine Learning Techniques for Internet of Things [34.544836653715244]
Federated learning (FL) and split learning (SL) are state-of-the-art distributed machine learning techniques. Recent FL and SL are combined to form splitfed learning (SFL) to leverage each of their benefits. This work considers FL, SL, and SFL, and mount them on Raspberry Pi devices to evaluate their performance.
arXiv Detail & Related papers (2021-03-03T23:55:37Z)
FedML: A Research Library and Benchmark for Federated Machine Learning [55.09054608875831]
Federated learning (FL) is a rapidly growing research field in machine learning. Existing FL libraries cannot adequately support diverse algorithmic development. We introduce FedML, an open research library and benchmark to facilitate FL algorithm development and fair performance comparison.
arXiv Detail & Related papers (2020-07-27T13:02:08Z)
Wireless Communications for Collaborative Federated Learning [160.82696473996566]
Internet of Things (IoT) devices may not be able to transmit their collected data to a central controller for training machine learning models. Google's seminal FL algorithm requires all devices to be directly connected with a central controller. This paper introduces a novel FL framework, called collaborative FL (CFL), which enables edge devices to implement FL with less reliance on a central controller.
arXiv Detail & Related papers (2020-06-03T20:00:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.