Papaya: Practical, Private, and Scalable Federated Learning
- URL: http://arxiv.org/abs/2111.04877v1
- Date: Mon, 8 Nov 2021 23:46:42 GMT
- Title: Papaya: Practical, Private, and Scalable Federated Learning
- Authors: Dzmitry Huba, John Nguyen, Kshitiz Malik, Ruiyu Zhu, Mike Rabbat,
Ashkan Yousefpour, Carole-Jean Wu, Hongyuan Zhan, Pavel Ustinov, Harish
Srinivas, Kaikai Wang, Anthony Shoumikhin, Jesik Min, Mani Malek
- Abstract summary: Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges.
Most FL systems described in the literature are synchronous - they perform a synchronized aggregation of model updates from individual clients.
In this work, we outline a production asynchronous FL system design.
- Score: 6.833772874570774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-device Federated Learning (FL) is a distributed learning paradigm with
several challenges that differentiate it from traditional distributed learning,
variability in the system characteristics on each device, and millions of
clients coordinating with a central server being primary ones. Most FL systems
described in the literature are synchronous - they perform a synchronized
aggregation of model updates from individual clients. Scaling synchronous FL is
challenging since increasing the number of clients training in parallel leads
to diminishing returns in training speed, analogous to large-batch training.
Moreover, stragglers hinder synchronous FL training. In this work, we outline a
production asynchronous FL system design. Our work tackles the aforementioned
issues, sketches of some of the system design challenges and their solutions,
and touches upon principles that emerged from building a production FL system
for millions of clients. Empirically, we demonstrate that asynchronous FL
converges faster than synchronous FL when training across nearly one hundred
million devices. In particular, in high concurrency settings, asynchronous FL
is 5x faster and has nearly 8x less communication overhead than synchronous FL.
Related papers
- Digital Twin-Assisted Federated Learning with Blockchain in Multi-tier Computing Systems [67.14406100332671]
In Industry 4.0 systems, resource-constrained edge devices engage in frequent data interactions.
This paper proposes a digital twin (DT) and federated digital twin (FL) scheme.
The efficacy of our proposed cooperative interference-based FL process has been verified through numerical analysis.
arXiv Detail & Related papers (2024-11-04T17:48:02Z) - FedAST: Federated Asynchronous Simultaneous Training [27.492821176616815]
Federated Learning (FL) enables devices or clients to collaboratively train machine learning (ML) models without sharing their private data.
Much of the existing work in FL focuses on efficiently learning a model for a single task.
In this paper, we propose simultaneous training of multiple FL models using a common set of datasets.
arXiv Detail & Related papers (2024-06-01T05:14:20Z) - AEDFL: Efficient Asynchronous Decentralized Federated Learning with
Heterogeneous Devices [61.66943750584406]
We propose an Asynchronous Efficient Decentralized FL framework, i.e., AEDFL, in heterogeneous environments.
First, we propose an asynchronous FL system model with an efficient model aggregation method for improving the FL convergence.
Second, we propose a dynamic staleness-aware model update approach to achieve superior accuracy.
Third, we propose an adaptive sparse training method to reduce communication and computation costs without significant accuracy degradation.
arXiv Detail & Related papers (2023-12-18T05:18:17Z) - TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with
Adaptive Partial Training [17.84692242938424]
TimelyFL is a heterogeneous-aware asynchronous Federated Learning framework with adaptive partial training.
We show that TimelyFL improves participation rate by 21.13%, harvests 1.28x - 2.89x more efficiency on convergence rate, and provides a 6.25% increment on test accuracy.
arXiv Detail & Related papers (2023-04-14T06:26:08Z) - Scheduling and Aggregation Design for Asynchronous Federated Learning
over Wireless Networks [56.91063444859008]
Federated Learning (FL) is a collaborative machine learning framework that combines on-device training and server-based aggregation.
We propose an asynchronous FL design with periodic aggregation to tackle the straggler issue in FL systems.
We show that an age-aware'' aggregation weighting design can significantly improve the learning performance in an asynchronous FL setting.
arXiv Detail & Related papers (2022-12-14T17:33:01Z) - FL Games: A Federated Learning Framework for Distribution Shifts [71.98708418753786]
Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server.
We propose FL GAMES, a game-theoretic framework for federated learning that learns causal features that are invariant across clients.
arXiv Detail & Related papers (2022-10-31T22:59:03Z) - Efficient and Light-Weight Federated Learning via Asynchronous
Distributed Dropout [22.584080337157168]
Asynchronous learning protocols have regained attention lately, especially in the Federated Learning (FL) setup.
We propose textttAsyncDrop, a novel asynchronous FL framework that utilizes dropout regularization to handle device heterogeneity in distributed settings.
Overall, textttAsyncDrop achieves better performance compared to state of the art asynchronous methodologies.
arXiv Detail & Related papers (2022-10-28T13:00:29Z) - FL Games: A federated learning framework for distribution shifts [71.98708418753786]
Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server.
We propose FL Games, a game-theoretic framework for federated learning for learning causal features that are invariant across clients.
arXiv Detail & Related papers (2022-05-23T07:51:45Z) - Blockchain-enabled Server-less Federated Learning [5.065631761462706]
We focus on an asynchronous server-less Federated Learning solution empowered by (BC) technology.
In contrast to mostly adopted FL approaches, we advocate an asynchronous method whereby model aggregation is done as clients submit their local updates.
arXiv Detail & Related papers (2021-12-15T07:41:23Z) - Device Scheduling and Update Aggregation Policies for Asynchronous
Federated Learning [72.78668894576515]
Federated Learning (FL) is a newly emerged decentralized machine learning (ML) framework.
We propose an asynchronous FL framework with periodic aggregation to eliminate the straggler issue in FL systems.
arXiv Detail & Related papers (2021-07-23T18:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.