Related papers: CFedAvg: Achieving Efficient Communication and Fast Convergence in Non-IID Federated Learning

CFedAvg: Achieving Efficient Communication and Fast Convergence in Non-IID Federated Learning

URL: http://arxiv.org/abs/2106.07155v1
Date: Mon, 14 Jun 2021 04:27:19 GMT
Title: CFedAvg: Achieving Efficient Communication and Fast Convergence in Non-IID Federated Learning
Authors: Haibo Yang, Jia Liu, Elizabeth S. Bentley
Abstract summary: Federated learning (FL) is a prevailing distributed learning paradigm, where a large number of workers jointly learn a model without sharing their training data. High communication costs could arise in FL due to deep-scale (deep) learning models and bandwidth-connected connections. We introduce a distributed communication datasets called CFedAvg for FL with non-biased SNR-constrained compressors.
Score: 8.702106020664612
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated learning (FL) is a prevailing distributed learning paradigm, where a large number of workers jointly learn a model without sharing their training data. However, high communication costs could arise in FL due to large-scale (deep) learning models and bandwidth-constrained connections. In this paper, we introduce a communication-efficient algorithmic framework called CFedAvg for FL with non-i.i.d. datasets, which works with general (biased or unbiased) SNR-constrained compressors. We analyze the convergence rate of CFedAvg for non-convex functions with constant and decaying learning rates. The CFedAvg algorithm can achieve an $\mathcal{O}(1 / \sqrt{mKT} + 1 / T)$ convergence rate with a constant learning rate, implying a linear speedup for convergence as the number of workers increases, where $K$ is the number of local steps, $T$ is the number of total communication rounds, and $m$ is the total worker number. This matches the convergence rate of distributed/federated learning without compression, thus achieving high communication efficiency while not sacrificing learning accuracy in FL. Furthermore, we extend CFedAvg to cases with heterogeneous local steps, which allows different workers to perform a different number of local steps to better adapt to their own circumstances. The interesting observation in general is that the noise/variance introduced by compressors does not affect the overall convergence rate order for non-i.i.d. FL. We verify the effectiveness of our CFedAvg algorithm on three datasets with two gradient compression schemes of different compression ratios.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
FedScalar: A Communication efficient Federated Learning [0.0]
Federated learning (FL) has gained considerable popularity for distributed machine learning. emphFedScalar enables agents to communicate updates using a single scalar.
arXiv Detail & Related papers (2024-10-03T07:06:49Z)
Communication-efficient Vertical Federated Learning via Compressed Error Feedback [24.32409923443071]
Communication overhead is a known bottleneck in learning (FL) We propose error feedback over federated networks to train networks. EFVFL does not require a vanishing compression error for smooth non-significant problems.
arXiv Detail & Related papers (2024-06-20T15:40:38Z)
Fed-CVLC: Compressing Federated Learning Communications with Variable-Length Codes [54.18186259484828]
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds. We show strong evidences that variable-length is beneficial for compression in FL. We present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response to the dynamics of model updates.
arXiv Detail & Related papers (2024-02-06T07:25:21Z)
Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees [18.24213566328972]
Decentralized learning computation (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are carried out by the clients without a central server. $textttDSpodFL$, a DFL methodology built on a generalized notion of $textitsporadicity$ in both local gradient and aggregation processes. $textttDSpodFL$ consistently achieves improved speeds compared with baselines under various system settings.
arXiv Detail & Related papers (2024-02-05T19:02:19Z)
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method. We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate. We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z)
DFedADMM: Dual Constraints Controlled Model Inconsistency for Decentralized Federated Learning [52.83811558753284]
Decentralized learning (DFL) discards the central server and establishes a decentralized communication network. Existing DFL methods still suffer from two major challenges: local inconsistency and local overfitting.
arXiv Detail & Related papers (2023-08-16T11:22:36Z)
Federated Learning Using Variance Reduced Stochastic Gradient for Probabilistically Activated Agents [0.0]
This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves both variance reduction and a faster convergence rate to an optimal solution in the setting where each agent has an arbitrary probability of selection in each iteration.
arXiv Detail & Related papers (2022-10-25T22:04:49Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Acceleration of Federated Learning with Alleviated Forgetting in Local Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy. We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage. Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z)
BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression [37.20712215269538]
Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications. This paper proposes BEER, which adopts communication with gradient tracking, shows it converges at a faster rate.
arXiv Detail & Related papers (2022-01-31T16:14:09Z)
Achieving Linear Speedup with Partial Worker Participation in Non-IID Federated Learning [6.994020662415705]
Federated learning (FL) is a distributed machine learning architecture that leverages a large number of workers to jointly learn a model with decentralized data. We show that the linear speedup for convergence is achievable under non-i.i.d. datasets with partial worker participation in FL.
arXiv Detail & Related papers (2021-01-27T04:38:27Z)
Over-the-Air Federated Learning from Heterogeneous Data [107.05618009955094]
Federated learning (FL) is a framework for distributed learning of centralized models. We develop a Convergent OTA FL (COTAF) algorithm which enhances the common local gradient descent (SGD) FL algorithm. We numerically show that the precoding induced by COTAF notably improves the convergence rate and the accuracy of models trained via OTA FL.
arXiv Detail & Related papers (2020-09-27T08:28:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.