CFedAvg: Achieving Efficient Communication and Fast Convergence in
Non-IID Federated Learning
- URL: http://arxiv.org/abs/2106.07155v1
- Date: Mon, 14 Jun 2021 04:27:19 GMT
- Title: CFedAvg: Achieving Efficient Communication and Fast Convergence in
Non-IID Federated Learning
- Authors: Haibo Yang, Jia Liu, Elizabeth S. Bentley
- Abstract summary: Federated learning (FL) is a prevailing distributed learning paradigm, where a large number of workers jointly learn a model without sharing their training data.
High communication costs could arise in FL due to deep-scale (deep) learning models and bandwidth-connected connections.
We introduce a distributed communication datasets called CFedAvg for FL with non-biased SNR-constrained compressors.
- Score: 8.702106020664612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) is a prevailing distributed learning paradigm, where
a large number of workers jointly learn a model without sharing their training
data. However, high communication costs could arise in FL due to large-scale
(deep) learning models and bandwidth-constrained connections. In this paper, we
introduce a communication-efficient algorithmic framework called CFedAvg for FL
with non-i.i.d. datasets, which works with general (biased or unbiased)
SNR-constrained compressors. We analyze the convergence rate of CFedAvg for
non-convex functions with constant and decaying learning rates. The CFedAvg
algorithm can achieve an $\mathcal{O}(1 / \sqrt{mKT} + 1 / T)$ convergence rate
with a constant learning rate, implying a linear speedup for convergence as the
number of workers increases, where $K$ is the number of local steps, $T$ is the
number of total communication rounds, and $m$ is the total worker number. This
matches the convergence rate of distributed/federated learning without
compression, thus achieving high communication efficiency while not sacrificing
learning accuracy in FL. Furthermore, we extend CFedAvg to cases with
heterogeneous local steps, which allows different workers to perform a
different number of local steps to better adapt to their own circumstances. The
interesting observation in general is that the noise/variance introduced by
compressors does not affect the overall convergence rate order for non-i.i.d.
FL. We verify the effectiveness of our CFedAvg algorithm on three datasets with
two gradient compression schemes of different compression ratios.
Related papers
- FedScalar: A Communication efficient Federated Learning [0.0]
Federated learning (FL) has gained considerable popularity for distributed machine learning.
emphFedScalar enables agents to communicate updates using a single scalar.
arXiv Detail & Related papers (2024-10-03T07:06:49Z) - Communication-efficient Vertical Federated Learning via Compressed Error Feedback [24.32409923443071]
Communication overhead is a known bottleneck in learning (FL)
We propose error feedback over federated networks to train networks.
EFVFL does not require a vanishing compression error for smooth non-significant problems.
arXiv Detail & Related papers (2024-06-20T15:40:38Z) - Fed-CVLC: Compressing Federated Learning Communications with
Variable-Length Codes [54.18186259484828]
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds.
We show strong evidences that variable-length is beneficial for compression in FL.
We present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response to the dynamics of model updates.
arXiv Detail & Related papers (2024-02-06T07:25:21Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - DFedADMM: Dual Constraints Controlled Model Inconsistency for
Decentralized Federated Learning [52.83811558753284]
Decentralized learning (DFL) discards the central server and establishes a decentralized communication network.
Existing DFL methods still suffer from two major challenges: local inconsistency and local overfitting.
arXiv Detail & Related papers (2023-08-16T11:22:36Z) - Federated Learning Using Variance Reduced Stochastic Gradient for
Probabilistically Activated Agents [0.0]
This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves both variance reduction and a faster convergence rate to an optimal solution in the setting where each agent has an arbitrary probability of selection in each iteration.
arXiv Detail & Related papers (2022-10-25T22:04:49Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Acceleration of Federated Learning with Alleviated Forgetting in Local
Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy.
We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage.
Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z) - BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with
Communication Compression [37.20712215269538]
Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications.
This paper proposes BEER, which adopts communication with gradient tracking, shows it converges at a faster rate.
arXiv Detail & Related papers (2022-01-31T16:14:09Z) - Achieving Linear Speedup with Partial Worker Participation in Non-IID
Federated Learning [6.994020662415705]
Federated learning (FL) is a distributed machine learning architecture that leverages a large number of workers to jointly learn a model with decentralized data.
We show that the linear speedup for convergence is achievable under non-i.i.d. datasets with partial worker participation in FL.
arXiv Detail & Related papers (2021-01-27T04:38:27Z) - Over-the-Air Federated Learning from Heterogeneous Data [107.05618009955094]
Federated learning (FL) is a framework for distributed learning of centralized models.
We develop a Convergent OTA FL (COTAF) algorithm which enhances the common local gradient descent (SGD) FL algorithm.
We numerically show that the precoding induced by COTAF notably improves the convergence rate and the accuracy of models trained via OTA FL.
arXiv Detail & Related papers (2020-09-27T08:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.