Time-Correlated Sparsification for Communication-Efficient Federated
Learning
- URL: http://arxiv.org/abs/2101.08837v1
- Date: Thu, 21 Jan 2021 20:15:55 GMT
- Title: Time-Correlated Sparsification for Communication-Efficient Federated
Learning
- Authors: Emre Ozfatura and Kerem Ozfatura and Deniz Gunduz
- Abstract summary: Federated learning (FL) enables multiple clients to collaboratively train a shared model without disclosing their local datasets.
We introduce a novel time-correlated sparsification scheme, which seeks a certain correlation between the sparse representations used at consecutive iterations in FL.
We show that TCS can achieve centralized training accuracy with 100 times sparsification, and up to 2000 times reduction in the communication load when employed together with quantization.
- Score: 6.746400031322727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) enables multiple clients to collaboratively train a
shared model without disclosing their local datasets. This is achieved by
exchanging local model updates with the help of a parameter server (PS).
However, due to the increasing size of the trained models, the communication
load due to the iterative exchanges between the clients and the PS often
becomes a bottleneck in the performance. Sparse communication is often employed
to reduce the communication load, where only a small subset of the model
updates are communicated from the clients to the PS. In this paper, we
introduce a novel time-correlated sparsification (TCS) scheme, which builds
upon the notion that sparse communication framework can be considered as
identifying the most significant elements of the underlying model. Hence, TCS
seeks a certain correlation between the sparse representations used at
consecutive iterations in FL, so that the overhead due to encoding and
transmission of the sparse representation can be significantly reduced without
compromising the test accuracy. Through extensive simulations on the CIFAR-10
dataset, we show that TCS can achieve centralized training accuracy with 100
times sparsification, and up to 2000 times reduction in the communication load
when employed together with quantization.
Related papers
- FedComLoc: Communication-Efficient Distributed Training of Sparse and Quantized Models [56.21666819468249]
Federated Learning (FL) has garnered increasing attention due to its unique characteristic of allowing heterogeneous clients to process their private data locally and interact with a central server.
We introduce FedComLoc, integrating practical and effective compression into emphScaffnew to further enhance communication efficiency.
arXiv Detail & Related papers (2024-03-14T22:29:59Z) - Fed-CVLC: Compressing Federated Learning Communications with
Variable-Length Codes [54.18186259484828]
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds.
We show strong evidences that variable-length is beneficial for compression in FL.
We present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response to the dynamics of model updates.
arXiv Detail & Related papers (2024-02-06T07:25:21Z) - Communication-Efficient Federated Learning through Adaptive Weight
Clustering and Server-Side Distillation [10.541541376305245]
Federated Learning (FL) is a promising technique for the collaborative training of deep neural networks across multiple devices.
FL is hindered by excessive communication costs due to repeated server-client communication during training.
We propose FedCompress, a novel approach that combines dynamic weight clustering and server-side knowledge distillation.
arXiv Detail & Related papers (2024-01-25T14:49:15Z) - Fundamental Limits of Communication Efficiency for Model Aggregation in
Distributed Learning: A Rate-Distortion Approach [54.311495894129585]
We study the limit of communication cost of model aggregation in distributed learning from a rate-distortion perspective.
It is found that the communication gain by exploiting the correlation between worker nodes is significant for SignSGD.
arXiv Detail & Related papers (2022-06-28T13:10:40Z) - On the Convergence Time of Federated Learning Over Wireless Networks
Under Imperfect CSI [28.782485580296374]
We propose a training process that takes channel statistics as a bias to minimize the convergence time under imperfect CSI.
We also examine the trade-off between number of clients involved in the training process and model accuracy as a function of different fading regimes.
arXiv Detail & Related papers (2021-04-01T08:30:45Z) - Adaptive Quantization of Model Updates for Communication-Efficient
Federated Learning [75.45968495410047]
Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning.
Gradient quantization is an effective way of reducing the number of bits required to communicate each model update.
We propose an adaptive quantization strategy called AdaFL that aims to achieve communication efficiency as well as a low error floor.
arXiv Detail & Related papers (2021-02-08T19:14:21Z) - CosSGD: Nonlinear Quantization for Communication-efficient Federated
Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server.
We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning.
Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z) - Distributed Sparse SGD with Majority Voting [5.32836690371986]
We introduce a majority voting based sparse communication strategy for distributed learning.
We show that it is possible to achieve up to x4000 compression without any loss in the test accuracy.
arXiv Detail & Related papers (2020-11-12T17:06:36Z) - Training Recommender Systems at Scale: Communication-Efficient Model and
Data Parallelism [56.78673028601739]
We propose a compression framework called Dynamic Communication Thresholding (DCT) for communication-efficient hybrid training.
DCT reduces communication by at least $100times$ and $20times$ during DP and MP, respectively.
It improves end-to-end training time for a state-of-the-art industrial recommender model by 37%, without any loss in performance.
arXiv Detail & Related papers (2020-10-18T01:44:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.