Distributed Sparse SGD with Majority Voting
- URL: http://arxiv.org/abs/2011.06495v1
- Date: Thu, 12 Nov 2020 17:06:36 GMT
- Title: Distributed Sparse SGD with Majority Voting
- Authors: Kerem Ozfatura and Emre Ozfatura and Deniz Gunduz
- Abstract summary: We introduce a majority voting based sparse communication strategy for distributed learning.
We show that it is possible to achieve up to x4000 compression without any loss in the test accuracy.
- Score: 5.32836690371986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed learning, particularly variants of distributed stochastic
gradient descent (DSGD), are widely employed to speed up training by leveraging
computational resources of several workers. However, in practise, communication
delay becomes a bottleneck due to the significant amount of information that
needs to be exchanged between the workers and the parameter server. One of the
most efficient strategies to mitigate the communication bottleneck is top-K
sparsification. However, top-K sparsification requires additional communication
load to represent the sparsity pattern, and the mismatch between the sparsity
patterns of the workers prevents exploitation of efficient communication
protocols. To address these issues, we introduce a novel majority voting based
sparse communication strategy, in which the workers first seek a consensus on
the structure of the sparse representation. This strategy provides a
significant reduction in the communication load and allows using the same
sparsity level in both communication directions. Through extensive simulations
on the CIFAR-10 dataset, we show that it is possible to achieve up to x4000
compression without any loss in the test accuracy.
Related papers
- Communication-Efficient Federated Knowledge Graph Embedding with Entity-Wise Top-K Sparsification [49.66272783945571]
Federated Knowledge Graphs Embedding learning (FKGE) encounters challenges in communication efficiency stemming from the considerable size of parameters and extensive communication rounds.
We propose bidirectional communication-efficient FedS based on Entity-Wise Top-K Sparsification strategy.
arXiv Detail & Related papers (2024-06-19T05:26:02Z) - Estimation Network Design framework for efficient distributed optimization [3.3148826359547514]
This paper introduces Estimation Network Design (END), a graph theoretical language for the analysis and design of distributed iterations.
END algorithms can be tuned to exploit the sparsity of specific problem instances, reducing communication overhead and minimizing redundancy.
In particular, we study the sparsity-aware version of many established methods, including ADMM, AugDGM and Push-Sum DGD.
arXiv Detail & Related papers (2024-04-23T17:59:09Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification
in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks.
We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD.
The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z) - Fundamental Limits of Communication Efficiency for Model Aggregation in
Distributed Learning: A Rate-Distortion Approach [54.311495894129585]
We study the limit of communication cost of model aggregation in distributed learning from a rate-distortion perspective.
It is found that the communication gain by exploiting the correlation between worker nodes is significant for SignSGD.
arXiv Detail & Related papers (2022-06-28T13:10:40Z) - Communication-Compressed Adaptive Gradient Method for Distributed
Nonconvex Optimization [21.81192774458227]
One of the major bottlenecks is the large communication cost between the central server and the local workers.
Our proposed distributed learning framework features an effective gradient gradient compression strategy.
arXiv Detail & Related papers (2021-11-01T04:54:55Z) - Communication-Efficient Federated Learning via Robust Distributed Mean
Estimation [16.41391088542669]
Federated learning relies on algorithms such as distributed (mini-batch) SGD, where multiple clients compute their gradients and send them to a central coordinator for averaging and updating the model.
DRIVE is a recent state of the art algorithm that compresses gradients using one bit per coordinate (with some lower-order overhead).
In this technical report, we generalize DRIVE to support any bandwidth constraint as well as extend it to support heterogeneous client resources and make it robust to packet loss.
arXiv Detail & Related papers (2021-08-19T17:59:21Z) - Time-Correlated Sparsification for Communication-Efficient Federated
Learning [6.746400031322727]
Federated learning (FL) enables multiple clients to collaboratively train a shared model without disclosing their local datasets.
We introduce a novel time-correlated sparsification scheme, which seeks a certain correlation between the sparse representations used at consecutive iterations in FL.
We show that TCS can achieve centralized training accuracy with 100 times sparsification, and up to 2000 times reduction in the communication load when employed together with quantization.
arXiv Detail & Related papers (2021-01-21T20:15:55Z) - Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models.
In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers.
We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.