Related papers: Convergence Analysis of Asynchronous Federated Learning with Gradient Compression for Non-Convex Optimization

Convergence Analysis of Asynchronous Federated Learning with Gradient Compression for Non-Convex Optimization

URL: http://arxiv.org/abs/2504.19903v1
Date: Mon, 28 Apr 2025 15:35:34 GMT
Title: Convergence Analysis of Asynchronous Federated Learning with Gradient Compression for Non-Convex Optimization
Authors: Diying Yang, Yingwei Hou, Danyang Xiao, Weigang Wu,
Abstract summary: Gradient compression is an effective technique for reducing communication costs in federated learning.<n>In this paper, we analyze the convergence behaviors of FL under different frameworks.<n>We prove that EF can effectively reduce the variance of gradient estimation despite asynchronous delay.
Score: 3.6093008648437013
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient compression is an effective technique for reducing communication costs in federated learning (FL), and error feedback (EF) is usually adopted to remedy the compression errors. However, there remains a lack of systematic study on these techniques in asynchronous FL. In this paper, we fill this gap by analyzing the convergence behaviors of FL under different frameworks. We firstly consider a basic asynchronous FL framework AsynFL, and provide an improved convergence analysis that relies on fewer assumptions and yields a superior convergence rate than prior studies. Then, we consider a variant framework with gradient compression, AsynFLC. We show sufficient conditions for its convergence to the optimum, indicating the interaction between asynchronous delay and compression rate. Our analysis also demonstrates that asynchronous delay amplifies the variance caused by compression, thereby hindering convergence, and such an impact is exacerbated by high data heterogeneity. Furthermore, we study the convergence of AsynFLC-EF, the framework that further integrates EF. We prove that EF can effectively reduce the variance of gradient estimation despite asynchronous delay, which enables AsynFLC-EF to match the convergence rate of AsynFL. We also show that the impact of asynchronous delay on EF is limited to slowing down the higher-order convergence term. Experimental results substantiate our analytical findings very well.

Related papers

Adaptive Deadline and Batch Layered Synchronized Federated Learning [66.93447103966439]
Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner.<n>We propose ADEL-FL, a novel framework that jointly optimize per-round deadlines and user-specific batch sizes for layer-wise aggregation.
arXiv Detail & Related papers (2025-05-29T19:59:18Z)
Nesterov Method for Asynchronous Pipeline Parallel Optimization [59.79227116582264]
We introduce a variant of Nesterov Accelerated Gradient (NAG) for asynchronous optimization in Pipeline Parallelism.<n>Specifically, we modify the look-ahead step in NAG to effectively address the staleness in gradients.<n>We theoretically prove that our approach converges at a sublinear rate in the presence of fixed delay in gradients.
arXiv Detail & Related papers (2025-05-02T08:23:29Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
The Impact Analysis of Delays in Asynchronous Federated Learning with Data Heterogeneity for Edge Intelligence [10.54196990763149]
Federated learning (FL) has provided a new methodology for coordinating a group of clients to train a machine learning model collaboratively.<n>This paper examines the impact of unknown causes of delay on training performance in an Asynchronous Federated Learning (AFL) system with data heterogeneity.
arXiv Detail & Related papers (2025-03-06T03:10:49Z)
FADAS: Towards Federated Adaptive Asynchronous Optimization [56.09666452175333]
Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning. This paper introduces federated adaptive asynchronous optimization, named FADAS, a novel method that incorporates asynchronous updates into adaptive federated optimization with provable guarantees. We rigorously establish the convergence rate of the proposed algorithms and empirical results demonstrate the superior performance of FADAS over other asynchronous FL baselines.
arXiv Detail & Related papers (2024-07-25T20:02:57Z)
Momentum Provably Improves Error Feedback! [54.93799845077906]
When untreated, errors caused by compression propagate exponential training behavior. EF21-SGDM improves the communication and sample complexities of previous error feedback algorithms.
arXiv Detail & Related papers (2023-05-24T13:52:02Z)
Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression [37.6593006747285]
In learning server (FL) systems, the communication cost between the clients and the central bottleneck can be high. In this paper, we propose a technique to remedy the downsides of biased compression. Under partial participation, we develop an extra slow-down factor due to a so-called stale error accumulation'' effect.
arXiv Detail & Related papers (2022-11-25T18:49:53Z)
Time-triggered Federated Learning over Wireless Networks [48.389824560183776]
We present a time-triggered FL algorithm (TT-Fed) over wireless networks. Our proposed TT-Fed algorithm improves the converged test accuracy by up to 12.5% and 5%, respectively.
arXiv Detail & Related papers (2022-04-26T16:37:29Z)
EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback [65.60461294185853]
We propose a new error feedback mechanism, EF21, based on the construction of a Markov compressor induced by a contractive compressor.<n>Several of these techniques have not been previously analyzed in combination with EF21.
arXiv Detail & Related papers (2021-10-07T09:29:14Z)
Compressing gradients by exploiting temporal correlation in momentum-SGD [17.995905582226463]
We analyze compression methods that exploit temporal correlation in systems with and without error-feedback. Experiments with the ImageNet dataset demonstrate that our proposed methods offer significant reduction in the rate of communication. We prove the convergence of SGD under an expected error assumption by establishing a bound for the minimum gradient norm.
arXiv Detail & Related papers (2021-08-17T18:04:06Z)
1-Bit Compressive Sensing for Efficient Federated Learning Over the Air [32.14738452396869]
This paper develops and analyzes a communication-efficient scheme for learning (FL) over the air, which incorporates 1-bit sensing (CS) into analog aggregation transmissions. For scalable computing, we develop an efficient implementation that is suitable for large-scale networks. Simulation results show that our proposed 1-bit CS based FL over the air achieves comparable performance to the ideal case.
arXiv Detail & Related papers (2021-03-30T03:50:31Z)
Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates [67.19481956584465]
It has been experimentally observed that the efficiency of distributed training with saturation (SGD) depends decisively on the batch size and -- in implementations -- on the staleness. We show that our results are tight and illustrate key findings in numerical experiments.
arXiv Detail & Related papers (2021-03-03T12:08:23Z)
Stragglers Are Not Disaster: A Hybrid Federated Learning Algorithm with Delayed Gradients [21.63719641718363]
Federated learning (FL) is a new machine learning framework which trains a joint model across a large amount of decentralized computing devices. This paper presents a novel FL algorithm, namely Hybrid Federated Learning (HFL), to achieve a learning balance in efficiency and effectiveness.
arXiv Detail & Related papers (2021-02-12T02:27:44Z)
Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning. We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems. We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.