Related papers: Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression

Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression

URL: http://arxiv.org/abs/2211.14292v1
Date: Fri, 25 Nov 2022 18:49:53 GMT
Title: Analysis of Error Feedback in Federated Non-Convex Optimization with Biased Compression
Authors: Xiaoyun Li and Ping Li
Abstract summary: In learning server (FL) systems, the communication cost between the clients and the central bottleneck can be high. In this paper, we propose a technique to remedy the downsides of biased compression. Under partial participation, we develop an extra slow-down factor due to a so-called stale error accumulation'' effect.
Score: 37.6593006747285
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In federated learning (FL) systems, e.g., wireless networks, the communication cost between the clients and the central server can often be a bottleneck. To reduce the communication cost, the paradigm of communication compression has become a popular strategy in the literature. In this paper, we focus on biased gradient compression techniques in non-convex FL problems. In the classical setting of distributed learning, the method of error feedback (EF) is a common technique to remedy the downsides of biased gradient compression. In this work, we study a compressed FL scheme equipped with error feedback, named Fed-EF. We further propose two variants: Fed-EF-SGD and Fed-EF-AMS, depending on the choice of the global model optimizer. We provide a generic theoretical analysis, which shows that directly applying biased compression in FL leads to a non-vanishing bias in the convergence rate. The proposed Fed-EF is able to match the convergence rate of the full-precision FL counterparts under data heterogeneity with a linear speedup. Moreover, we develop a new analysis of the EF under partial client participation, which is an important scenario in FL. We prove that under partial participation, the convergence rate of Fed-EF exhibits an extra slow-down factor due to a so-called ``stale error compensation'' effect. A numerical study is conducted to justify the intuitive impact of stale error accumulation on the norm convergence of Fed-EF under partial participation. Finally, we also demonstrate that incorporating the two-way compression in Fed-EF does not change the convergence results. In summary, our work conducts a thorough analysis of the error feedback in federated non-convex optimization. Our analysis with partial client participation also provides insights on a theoretical limitation of the error feedback mechanism, and possible directions for improvements.

Related papers

Convergence Analysis of Asynchronous Federated Learning with Gradient Compression for Non-Convex Optimization [3.6093008648437013]
Gradient compression is an effective technique for reducing communication costs in federated learning. In this paper, we analyze the convergence behaviors of FL under different frameworks. We prove that EF can effectively reduce the variance of gradient estimation despite asynchronous delay.
arXiv Detail & Related papers (2025-04-28T15:35:34Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Over-the-Air Fair Federated Learning via Multi-Objective Optimization [52.295563400314094]
We propose an over-the-air fair federated learning algorithm (OTA-FFL) to train fair FL models. Experiments demonstrate the superiority of OTA-FFL in achieving fairness and robust performance.
arXiv Detail & Related papers (2025-01-06T21:16:51Z)
FedShift: Robust Federated Learning Aggregation Scheme in Resource Constrained Environment via Weight Shifting [5.680416078423551]
Federated Learning (FL) commonly relies on a central server to coordinate training across distributed clients. Clients may employ different quantization levels based on their hardware or network constraints, necessitating a mixed-precision aggregation process at the server. We propose FedShift, a novel aggregation methodology designed to mitigate performance degradation in FL scenarios with mixed quantization levels.
arXiv Detail & Related papers (2024-02-02T00:03:51Z)
Harnessing the Power of Federated Learning in Federated Contextual Bandits [20.835106310302876]
Federated contextual bandits (FCB) are a pivotal integration of FL and sequential decision-making. FCB approaches have largely employed their tailored FL components, often deviating from the canonical FL framework. In particular, a novel FCB design, termed FedIGW, is proposed to leverage a regression-based CB algorithm.
arXiv Detail & Related papers (2023-12-26T21:44:09Z)
Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp) We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings. For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error. In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z)
FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z)
Federated Conformal Predictors for Distributed Uncertainty Quantification [83.50609351513886]
Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning. In this paper, we extend conformal prediction to the federated learning setting. We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction framework.
arXiv Detail & Related papers (2023-05-27T19:57:27Z)
On The Impact of Client Sampling on Federated Learning Convergence [4.530678016396477]
We introduce a novel decomposition theorem for the convergence of FL, allowing to clearly quantify the impact of client sampling on the global model update. Our results suggest that MD sampling should be used as default sampling scheme, due to the resilience to the changes in data ratio during the learning process, while Uniform sampling is superior only in the special case when clients have the same amount of data.
arXiv Detail & Related papers (2021-07-26T13:36:06Z)
Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization [69.07420650261649]
We introduce a novel, simple, and powerful contrastive MI estimator named as FLO. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
arXiv Detail & Related papers (2021-07-02T15:20:41Z)
Achieving Statistical Optimality of Federated Learning: Beyond Stationary Points [19.891597817559038]
Federated Learning (FL) is a promising framework that has great potentials in privacy preservation and in lowering the computation load at the cloud. Recent work raised concerns on two methods: (1) their fixed points do not correspond to the stationary points of the original optimization problem, and (2) the common model found might not generalize well locally. We show, in the general kernel regression setting, that both FedAvg and FedProx converge to the minimax-optimal error rates.
arXiv Detail & Related papers (2021-06-29T09:59:43Z)
1-Bit Compressive Sensing for Efficient Federated Learning Over the Air [32.14738452396869]
This paper develops and analyzes a communication-efficient scheme for learning (FL) over the air, which incorporates 1-bit sensing (CS) into analog aggregation transmissions. For scalable computing, we develop an efficient implementation that is suitable for large-scale networks. Simulation results show that our proposed 1-bit CS based FL over the air achieves comparable performance to the ideal case.
arXiv Detail & Related papers (2021-03-30T03:50:31Z)
Federated Composite Optimization [28.11253930828807]
Federated Learning (FL) is a distributed learning paradigm that scales on-device learning collaboratively and privately. Standard FL algorithms such as FedAvg are primarily geared towards smooth unconstrained settings. We propose a new primal-dual algorithm, Federated Dual Averaging (FedDualAvg), which by employing a novel server dual averaging procedure circumvents the curse of primal averaging.
arXiv Detail & Related papers (2020-11-17T06:54:06Z)
A Unified Linear Speedup Analysis of Federated Averaging and Nesterov FedAvg [49.76940694847521]
Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data. In this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective FL algorithms in use today. We show that FedAvg enjoys linear speedup in each case, although with different convergence rates and communication efficiencies.
arXiv Detail & Related papers (2020-07-11T05:59:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.