FRAIN to Train: A Fast-and-Reliable Solution for Decentralized Federated Learning
- URL: http://arxiv.org/abs/2505.04223v1
- Date: Wed, 07 May 2025 08:20:23 GMT
- Title: FRAIN to Train: A Fast-and-Reliable Solution for Decentralized Federated Learning
- Authors: Sanghyeon Park, Soo-Mook Moon,
- Abstract summary: asynchronous learning (FL) enables collaborative model training across distributed clients while preserving data locality.<n>We introduce Fast-and-Reliable AI Network, FRAIN, a new FL method that mitigates these limitations by incorporating two key ideas.<n>Experiments with a CNN image-classification model and a Transformer-based language model demonstrate that FRAIN achieves more stable and robust convergence than FedAvg, FedAsync, and BRAIN.
- Score: 1.1510009152620668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning (FL) enables collaborative model training across distributed clients while preserving data locality. Although FedAvg pioneered synchronous rounds for global model averaging, slower devices can delay collective progress. Asynchronous FL (e.g., FedAsync) addresses stragglers by continuously integrating client updates, yet naive implementations risk client drift due to non-IID data and stale contributions. Some Blockchain-based FL approaches (e.g., BRAIN) employ robust weighting or scoring of updates to resist malicious or misaligned proposals. However, performance drops can still persist under severe data heterogeneity or high staleness, and synchronization overhead has emerged as a new concern due to its aggregator-free architectures. We introduce Fast-and-Reliable AI Network, FRAIN, a new asynchronous FL method that mitigates these limitations by incorporating two key ideas. First, our FastSync strategy eliminates the need to replay past model versions, enabling newcomers and infrequent participants to efficiently approximate the global model. Second, we adopt spherical linear interpolation (SLERP) when merging parameters, preserving models' directions and alleviating destructive interference from divergent local training. Experiments with a CNN image-classification model and a Transformer-based language model demonstrate that FRAIN achieves more stable and robust convergence than FedAvg, FedAsync, and BRAIN, especially under harsh environments: non-IID data distributions, networks that experience delays and require frequent re-synchronization, and the presence of malicious nodes.
Related papers
- Asynchronous Federated Learning with non-convex client objective functions and heterogeneous dataset [0.9208007322096533]
Tosampling Federated Learning (FL) enables collaborative model across decentralized devices while preserving stale data privacy.<n>Asynchronous Learning (AFL) addresses these by allowing clients to update independently, improving scalability and reducing delays synchronization.<n>Our framework accommodates variations in data power, distribution, and communication, making it practical for real world applications.
arXiv Detail & Related papers (2025-08-03T09:06:42Z) - Adaptive Deadline and Batch Layered Synchronized Federated Learning [66.93447103966439]
Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner.<n>We propose ADEL-FL, a novel framework that jointly optimize per-round deadlines and user-specific batch sizes for layer-wise aggregation.
arXiv Detail & Related papers (2025-05-29T19:59:18Z) - Asynchronous Federated Learning: A Scalable Approach for Decentralized Machine Learning [0.9208007322096533]
Federated Learning (FL) has emerged as a powerful paradigm for decentralized machine learning, enabling collaborative model training across diverse clients without sharing raw data.<n>Traditional FL approaches often face limitations in scalability and efficiency due to their reliance on synchronous client updates.<n>We propose an Asynchronous Federated Learning (AFL) algorithm, which allows clients to update the global model independently and asynchronously.
arXiv Detail & Related papers (2024-12-23T17:11:02Z) - Asynchronous Byzantine Federated Learning [4.6792910030704515]
Federated learning (FL) enables a set of geographically distributed clients to collectively train a model through a server.
Our solution is one of the first Byzantine-resilient and asynchronous FL algorithms.
We compare the performance of our solution with state-of-the-art algorithms on both image and text datasets.
arXiv Detail & Related papers (2024-06-03T15:29:38Z) - FedAST: Federated Asynchronous Simultaneous Training [27.492821176616815]
Federated Learning (FL) enables devices or clients to collaboratively train machine learning (ML) models without sharing their private data.
Much of the existing work in FL focuses on efficiently learning a model for a single task.
In this paper, we propose simultaneous training of multiple FL models using a common set of datasets.
arXiv Detail & Related papers (2024-06-01T05:14:20Z) - An Aggregation-Free Federated Learning for Tackling Data Heterogeneity [50.44021981013037]
Federated Learning (FL) relies on the effectiveness of utilizing knowledge from distributed datasets.
Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round.
We introduce FedAF, a novel aggregation-free FL algorithm.
arXiv Detail & Related papers (2024-04-29T05:55:23Z) - Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning.
As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers.
We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z) - AEDFL: Efficient Asynchronous Decentralized Federated Learning with
Heterogeneous Devices [61.66943750584406]
We propose an Asynchronous Efficient Decentralized FL framework, i.e., AEDFL, in heterogeneous environments.
First, we propose an asynchronous FL system model with an efficient model aggregation method for improving the FL convergence.
Second, we propose a dynamic staleness-aware model update approach to achieve superior accuracy.
Third, we propose an adaptive sparse training method to reduce communication and computation costs without significant accuracy degradation.
arXiv Detail & Related papers (2023-12-18T05:18:17Z) - Towards Instance-adaptive Inference for Federated Learning [80.38701896056828]
Federated learning (FL) is a distributed learning paradigm that enables multiple clients to learn a powerful global model by aggregating local training.
In this paper, we present a novel FL algorithm, i.e., FedIns, to handle intra-client data heterogeneity by enabling instance-adaptive inference in the FL framework.
Our experiments show that our FedIns outperforms state-of-the-art FL algorithms, e.g., a 6.64% improvement against the top-performing method with less than 15% communication cost on Tiny-ImageNet.
arXiv Detail & Related papers (2023-08-11T09:58:47Z) - SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model
Communication [11.763368822546468]
We show that SWIFT converges faster with respect to run-time due to its wait-free structure.
SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards 50% faster than existing SOTA algorithms.
arXiv Detail & Related papers (2022-10-25T14:01:21Z) - Acceleration of Federated Learning with Alleviated Forgetting in Local
Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy.
We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage.
Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.