Momentum Benefits Non-IID Federated Learning Simply and Provably
- URL: http://arxiv.org/abs/2306.16504v3
- Date: Tue, 5 Mar 2024 17:51:19 GMT
- Title: Momentum Benefits Non-IID Federated Learning Simply and Provably
- Authors: Ziheng Cheng, Xinmeng Huang, Pengfei Wu, Kun Yuan
- Abstract summary: Federated learning is a powerful paradigm for large-scale machine learning.
FedAvg and SCAFFOLD are two prominent algorithms to address these challenges.
This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD.
- Score: 22.800862422479913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning is a powerful paradigm for large-scale machine learning,
but it faces significant challenges due to unreliable network connections, slow
communication, and substantial data heterogeneity across clients. FedAvg and
SCAFFOLD are two prominent algorithms to address these challenges. In
particular, FedAvg employs multiple local updates before communicating with a
central server, while SCAFFOLD maintains a control variable on each client to
compensate for ``client drift'' in its local updates. Various methods have been
proposed to enhance the convergence of these two algorithms, but they either
make impractical adjustments to the algorithmic structure or rely on the
assumption of bounded data heterogeneity.
This paper explores the utilization of momentum to enhance the performance of
FedAvg and SCAFFOLD. When all clients participate in the training process, we
demonstrate that incorporating momentum allows FedAvg to converge without
relying on the assumption of bounded data heterogeneity even using a constant
local learning rate. This is novel and fairly surprising as existing analyses
for FedAvg require bounded data heterogeneity even with diminishing local
learning rates. In partial client participation, we show that momentum enables
SCAFFOLD to converge provably faster without imposing any additional
assumptions. Furthermore, we use momentum to develop new variance-reduced
extensions of FedAvg and SCAFFOLD, which exhibit state-of-the-art convergence
rates. Our experimental results support all theoretical findings.
Related papers
- Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Efficient Federated Learning against Heterogeneous and Non-stationary Client Unavailability [23.466997173249034]
FedAPM includes novel structures that (i) for missed computations due to unavailability with only $(1)O$ additional memory computation with respect to standard FedAvg.
We show that FedAPM converges to a stationary point even non-stationary algorithm despite being non-stationary dynamics.
arXiv Detail & Related papers (2024-09-26T00:38:18Z) - FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning [57.38427653043984]
Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients.
We introduce FedCAda, an innovative federated client adaptive algorithm designed to tackle this challenge.
We demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance.
arXiv Detail & Related papers (2024-05-20T06:12:33Z) - Achieving Linear Speedup in Asynchronous Federated Learning with
Heterogeneous Clients [30.135431295658343]
Federated learning (FL) aims to learn a common global model without exchanging or transferring the data that are stored locally at different clients.
In this paper, we propose an efficient federated learning (AFL) framework called DeFedAvg.
DeFedAvg is the first AFL algorithm that achieves the desirable linear speedup property, which indicates its high scalability.
arXiv Detail & Related papers (2024-02-17T05:22:46Z) - FLASH: Federated Learning Across Simultaneous Heterogeneities [54.80435317208111]
FLASH(Federated Learning Across Simultaneous Heterogeneities) is a lightweight and flexible client selection algorithm.
It outperforms state-of-the-art FL frameworks under extensive sources of Heterogeneities.
It achieves substantial and consistent improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-13T20:04:39Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Federated Learning under Heterogeneous and Correlated Client
Availability [10.05687757555923]
This paper presents the first convergence analysis for a FedAvg-like FL algorithm under heterogeneous and correlated client availability.
We propose CA-Fed, a new FL algorithm that tries to balance the conflicting goals of maximizing convergence speed and minimizing model bias.
Our experimental results show that CA-Fed achieves higher time-average accuracy and a lower standard deviation than state-of-the-art AdaFed and F3AST.
arXiv Detail & Related papers (2023-01-11T18:38:48Z) - FedSkip: Combatting Statistical Heterogeneity with Federated Skip
Aggregation [95.85026305874824]
We introduce a data-driven approach called FedSkip to improve the client optima by periodically skipping federated averaging and scattering local models to the cross devices.
We conduct extensive experiments on a range of datasets to demonstrate that FedSkip achieves much higher accuracy, better aggregation efficiency and competing communication efficiency.
arXiv Detail & Related papers (2022-12-14T13:57:01Z) - On the effectiveness of partial variance reduction in federated learning
with heterogeneous data [27.527995694042506]
We show that the diversity of the final classification layers across clients impedes the performance of the FedAvg algorithm.
Motivated by this, we propose to correct model by variance reduction only on the final layers.
We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost.
arXiv Detail & Related papers (2022-12-05T11:56:35Z) - Speeding up Heterogeneous Federated Learning with Sequentially Trained
Superclients [19.496278017418113]
Federated Learning (FL) allows training machine learning models in privacy-constrained scenarios by enabling the cooperation of edge devices without requiring local data sharing.
This approach raises several challenges due to the different statistical distribution of the local datasets and the clients' computational heterogeneity.
We propose FedSeq, a novel framework leveraging the sequential training of subgroups of heterogeneous clients, i.e. superclients, to emulate the centralized paradigm in a privacy-compliant way.
arXiv Detail & Related papers (2022-01-26T12:33:23Z) - Towards Fair Federated Learning with Zero-Shot Data Augmentation [123.37082242750866]
Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data.
We propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks.
We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server).
arXiv Detail & Related papers (2021-04-27T18:23:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.