Related papers: Momentum Benefits Non-IID Federated Learning Simply and Provably

Momentum Benefits Non-IID Federated Learning Simply and Provably

URL: http://arxiv.org/abs/2306.16504v3
Date: Tue, 5 Mar 2024 17:51:19 GMT
Title: Momentum Benefits Non-IID Federated Learning Simply and Provably
Authors: Ziheng Cheng, Xinmeng Huang, Pengfei Wu, Kun Yuan
Abstract summary: Federated learning is a powerful paradigm for large-scale machine learning. FedAvg and SCAFFOLD are two prominent algorithms to address these challenges. This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD.
Score: 22.800862422479913
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated learning is a powerful paradigm for large-scale machine learning, but it faces significant challenges due to unreliable network connections, slow communication, and substantial data heterogeneity across clients. FedAvg and SCAFFOLD are two prominent algorithms to address these challenges. In particular, FedAvg employs multiple local updates before communicating with a central server, while SCAFFOLD maintains a control variable on each client to compensate for ``client drift'' in its local updates. Various methods have been proposed to enhance the convergence of these two algorithms, but they either make impractical adjustments to the algorithmic structure or rely on the assumption of bounded data heterogeneity. This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD. When all clients participate in the training process, we demonstrate that incorporating momentum allows FedAvg to converge without relying on the assumption of bounded data heterogeneity even using a constant local learning rate. This is novel and fairly surprising as existing analyses for FedAvg require bounded data heterogeneity even with diminishing local learning rates. In partial client participation, we show that momentum enables SCAFFOLD to converge provably faster without imposing any additional assumptions. Furthermore, we use momentum to develop new variance-reduced extensions of FedAvg and SCAFFOLD, which exhibit state-of-the-art convergence rates. Our experimental results support all theoretical findings.

Related papers

Stragglers Can Contribute More: Uncertainty-Aware Distillation for Asynchronous Federated Learning [61.249748418757946]
Asynchronous federated learning (FL) has recently gained attention for its enhanced efficiency and scalability.<n>We propose FedEcho, a novel framework that incorporates uncertainty-aware distillation to enhance the asynchronous FL performances.<n>We demonstrate that FedEcho consistently outperforms existing asynchronous federated learning baselines.
arXiv Detail & Related papers (2025-11-25T06:25:25Z)
Exact and Linear Convergence for Federated Learning under Arbitrary Client Participation is Attainable [9.870718388000645]
This work tackles the fundamental challenges in Federated Learning (FL)<n>It is well-established that popular FedAvg-style algorithms struggle with exact convergence.<n>We present FOCUS, Federated Optimization with Exact Convergence via Push-pull Strategy, a provably convergent algorithm.
arXiv Detail & Related papers (2025-03-25T23:54:23Z)
Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z)
Efficient Federated Learning against Heterogeneous and Non-stationary Client Unavailability [23.466997173249034]
FedAPM includes novel structures that (i) for missed computations due to unavailability with only $(1)O$ additional memory computation with respect to standard FedAvg. We show that FedAPM converges to a stationary point even non-stationary algorithm despite being non-stationary dynamics.
arXiv Detail & Related papers (2024-09-26T00:38:18Z)
FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning [57.38427653043984]
Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients. We introduce FedCAda, an innovative federated client adaptive algorithm designed to tackle this challenge. We demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance.
arXiv Detail & Related papers (2024-05-20T06:12:33Z)
Achieving Linear Speedup in Asynchronous Federated Learning with Heterogeneous Clients [30.135431295658343]
Federated learning (FL) aims to learn a common global model without exchanging or transferring the data that are stored locally at different clients. In this paper, we propose an efficient federated learning (AFL) framework called DeFedAvg. DeFedAvg is the first AFL algorithm that achieves the desirable linear speedup property, which indicates its high scalability.
arXiv Detail & Related papers (2024-02-17T05:22:46Z)
FLASH: Federated Learning Across Simultaneous Heterogeneities [54.80435317208111]
FLASH(Federated Learning Across Simultaneous Heterogeneities) is a lightweight and flexible client selection algorithm. It outperforms state-of-the-art FL frameworks under extensive sources of Heterogeneities. It achieves substantial and consistent improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-13T20:04:39Z)
Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum [19.473386008007942]
Federated Learning (FL) has emerged as the state-of-the-art approach for learning from decentralized data in privacy-constrained scenarios.<n>Despite significant research efforts, existing approaches often degrade severely due to the joint effect of heterogeneity and partial client participation.<n>In this work, we propose a novel Generalized Heavy-Ball Momentum (GHBM)<n>We show that GHBM substantially improves state-of-the-art performance under random uniform client sampling.
arXiv Detail & Related papers (2023-11-30T14:17:57Z)
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method. We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate. We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z)
Federated Learning under Heterogeneous and Correlated Client Availability [10.05687757555923]
This paper presents the first convergence analysis for a FedAvg-like FL algorithm under heterogeneous and correlated client availability. We propose CA-Fed, a new FL algorithm that tries to balance the conflicting goals of maximizing convergence speed and minimizing model bias. Our experimental results show that CA-Fed achieves higher time-average accuracy and a lower standard deviation than state-of-the-art AdaFed and F3AST.
arXiv Detail & Related papers (2023-01-11T18:38:48Z)
FedSkip: Combatting Statistical Heterogeneity with Federated Skip Aggregation [95.85026305874824]
We introduce a data-driven approach called FedSkip to improve the client optima by periodically skipping federated averaging and scattering local models to the cross devices. We conduct extensive experiments on a range of datasets to demonstrate that FedSkip achieves much higher accuracy, better aggregation efficiency and competing communication efficiency.
arXiv Detail & Related papers (2022-12-14T13:57:01Z)
On the effectiveness of partial variance reduction in federated learning with heterogeneous data [27.527995694042506]
We show that the diversity of the final classification layers across clients impedes the performance of the FedAvg algorithm. Motivated by this, we propose to correct model by variance reduction only on the final layers. We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost.
arXiv Detail & Related papers (2022-12-05T11:56:35Z)
Speeding up Heterogeneous Federated Learning with Sequentially Trained Superclients [19.496278017418113]
Federated Learning (FL) allows training machine learning models in privacy-constrained scenarios by enabling the cooperation of edge devices without requiring local data sharing. This approach raises several challenges due to the different statistical distribution of the local datasets and the clients' computational heterogeneity. We propose FedSeq, a novel framework leveraging the sequential training of subgroups of heterogeneous clients, i.e. superclients, to emulate the centralized paradigm in a privacy-compliant way.
arXiv Detail & Related papers (2022-01-26T12:33:23Z)
Towards Fair Federated Learning with Zero-Shot Data Augmentation [123.37082242750866]
Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data. We propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks. We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server).
arXiv Detail & Related papers (2021-04-27T18:23:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.