Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates
- URL: http://arxiv.org/abs/2403.18375v1
- Date: Wed, 27 Mar 2024 09:14:36 GMT
- Title: Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates
- Authors: Natalie Lang, Alejandro Cohen, Nir Shlezinger,
- Abstract summary: Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning.
As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers.
We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
- Score: 71.81037644563217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches discard incomplete intra-model updates done by stragglers, alter the amount of local workload and architecture, or resort to asynchronous settings; which all affect the trained model performance under tight training latency constraints. In this work, we propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion. SALF allows stragglers to synchronously convey partial gradients, having each layer of the global model be updated independently with a different contributing set of users. We provide a theoretical analysis, establishing convergence guarantees for the global model under mild assumptions on the distribution of the participating devices, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations. This insight is matched with empirical observations, demonstrating the performance gains of SALF compared to alternative mechanisms mitigating the device heterogeneity gap in FL.
Related papers
- Federated Learning with MMD-based Early Stopping for Adaptive GNSS Interference Classification [4.674584508653125]
Federated learning (FL) enables multiple devices to collaboratively train a global model while maintaining data on local servers.
We propose an FL approach using few-shot learning and aggregation of the model weights on a global server.
An exemplary application of FL is orchestrating machine learning models along highways for interference classification based on snapshots from global navigation satellite system (GNSS) receivers.
arXiv Detail & Related papers (2024-10-21T06:43:04Z) - Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp)
We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings.
For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error.
In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z) - Semi-Asynchronous Federated Edge Learning Mechanism via Over-the-air
Computation [4.598679151181452]
We propose a semi-asynchronous aggregation FEEL mechanism with AirComp scheme (PAOTA) to improve the training efficiency of the FEEL system.
Our proposed algorithm achieves convergence performance close to that of the ideal Local SGD.
arXiv Detail & Related papers (2023-05-06T15:06:03Z) - Vertical Federated Learning over Cloud-RAN: Convergence Analysis and
System Optimization [82.12796238714589]
We propose a novel cloud radio access network (Cloud-RAN) based vertical FL system to enable fast and accurate model aggregation.
We characterize the convergence behavior of the vertical FL algorithm considering both uplink and downlink transmissions.
We establish a system optimization framework by joint transceiver and fronthaul quantization design, for which successive convex approximation and alternate convex search based system optimization algorithms are developed.
arXiv Detail & Related papers (2023-05-04T09:26:03Z) - Delay-Aware Hierarchical Federated Learning [7.292078085289465]
The paper introduces delay-aware hierarchical federated learning (DFL) to improve the efficiency of distributed machine learning (ML) model training.
During global synchronization, the cloud server consolidates local models with an outdated global model using a convex control algorithm.
Numerical evaluations show DFL's superior performance in terms of faster global model, reduced convergence resource, and evaluations against communication delays.
arXiv Detail & Related papers (2023-03-22T09:23:29Z) - Fed-FSNet: Mitigating Non-I.I.D. Federated Learning via Fuzzy
Synthesizing Network [19.23943687834319]
Federated learning (FL) has emerged as a promising privacy-preserving distributed machine learning framework.
We propose a novel FL training framework, dubbed Fed-FSNet, using a properly designed Fuzzy Synthesizing Network (FSNet) to mitigate the Non-I.I.D. at-the-source issue.
arXiv Detail & Related papers (2022-08-21T18:40:51Z) - Decentralized Event-Triggered Federated Learning with Heterogeneous
Communication Thresholds [12.513477328344255]
We propose a novel methodology for distributed model aggregations via asynchronous, event-triggered consensus iterations over a network graph topology.
We demonstrate that our methodology achieves the globally optimal learning model under standard assumptions in distributed learning and graph consensus literature.
arXiv Detail & Related papers (2022-04-07T20:35:37Z) - Parallel Successive Learning for Dynamic Distributed Model Training over
Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices.
We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions.
Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z) - Device Scheduling and Update Aggregation Policies for Asynchronous
Federated Learning [72.78668894576515]
Federated Learning (FL) is a newly emerged decentralized machine learning (ML) framework.
We propose an asynchronous FL framework with periodic aggregation to eliminate the straggler issue in FL systems.
arXiv Detail & Related papers (2021-07-23T18:57:08Z) - Fast-Convergent Federated Learning [82.32029953209542]
Federated learning is a promising solution for distributing machine learning tasks through modern networks of mobile devices.
We propose a fast-convergent federated learning algorithm, called FOLB, which performs intelligent sampling of devices in each round of model training.
arXiv Detail & Related papers (2020-07-26T14:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.