Related papers: Distributed Sparse SGD with Majority Voting

Distributed Sparse SGD with Majority Voting

URL: http://arxiv.org/abs/2011.06495v1
Date: Thu, 12 Nov 2020 17:06:36 GMT
Title: Distributed Sparse SGD with Majority Voting
Authors: Kerem Ozfatura and Emre Ozfatura and Deniz Gunduz
Abstract summary: We introduce a majority voting based sparse communication strategy for distributed learning. We show that it is possible to achieve up to x4000 compression without any loss in the test accuracy.
Score: 5.32836690371986
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distributed learning, particularly variants of distributed stochastic gradient descent (DSGD), are widely employed to speed up training by leveraging computational resources of several workers. However, in practise, communication delay becomes a bottleneck due to the significant amount of information that needs to be exchanged between the workers and the parameter server. One of the most efficient strategies to mitigate the communication bottleneck is top-K sparsification. However, top-K sparsification requires additional communication load to represent the sparsity pattern, and the mismatch between the sparsity patterns of the workers prevents exploitation of efficient communication protocols. To address these issues, we introduce a novel majority voting based sparse communication strategy, in which the workers first seek a consensus on the structure of the sparse representation. This strategy provides a significant reduction in the communication load and allows using the same sparsity level in both communication directions. Through extensive simulations on the CIFAR-10 dataset, we show that it is possible to achieve up to x4000 compression without any loss in the test accuracy.

Related papers

Sparsity-Aware Communication for Distributed Graph Neural Network Training [0.41942958779358674]
Graph Neural Networks (GNNs) are a computationally efficient method to learn embeddings and classifications on graph data. GNN training has low computational intensity, making communication costs the bottleneck for scalability. We develop sparsity-aware algorithms that tackle the communication bottlenecks in GNN training with three novel approaches.
arXiv Detail & Related papers (2025-04-07T01:53:14Z)
Communication-Efficient Personalized Federal Graph Learning via Low-Rank Decomposition [18.99572321624751]
We propose a communication-efficient personalized graph learning algorithm, CEFGL. Our method decomposes the model parameters into low-rank generic and sparse private models. We employ a dual-channel encoder to learn sparse local knowledge in a personalized manner.
arXiv Detail & Related papers (2024-12-18T02:26:07Z)
Communication-Efficient Federated Knowledge Graph Embedding with Entity-Wise Top-K Sparsification [49.66272783945571]
Federated Knowledge Graphs Embedding learning (FKGE) encounters challenges in communication efficiency stemming from the considerable size of parameters and extensive communication rounds. We propose bidirectional communication-efficient FedS based on Entity-Wise Top-K Sparsification strategy.
arXiv Detail & Related papers (2024-06-19T05:26:02Z)
Estimation Network Design framework for efficient distributed optimization [3.3148826359547514]
This paper introduces Estimation Network Design (END), a graph theoretical language for the analysis and design of distributed iterations. END algorithms can be tuned to exploit the sparsity of specific problem instances, reducing communication overhead and minimizing redundancy. In particular, we study the sparsity-aware version of many established methods, including ADMM, AugDGM and Push-Sum DGD.
arXiv Detail & Related papers (2024-04-23T17:59:09Z)
Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem. We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z)
Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks. We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD. The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z)
Fundamental Limits of Communication Efficiency for Model Aggregation in Distributed Learning: A Rate-Distortion Approach [54.311495894129585]
We study the limit of communication cost of model aggregation in distributed learning from a rate-distortion perspective. It is found that the communication gain by exploiting the correlation between worker nodes is significant for SignSGD.
arXiv Detail & Related papers (2022-06-28T13:10:40Z)
Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization [21.81192774458227]
One of the major bottlenecks is the large communication cost between the central server and the local workers. Our proposed distributed learning framework features an effective gradient gradient compression strategy.
arXiv Detail & Related papers (2021-11-01T04:54:55Z)
Communication-Efficient Federated Learning via Robust Distributed Mean Estimation [16.41391088542669]
Federated learning relies on algorithms such as distributed (mini-batch) SGD, where multiple clients compute their gradients and send them to a central coordinator for averaging and updating the model. DRIVE is a recent state of the art algorithm that compresses gradients using one bit per coordinate (with some lower-order overhead). In this technical report, we generalize DRIVE to support any bandwidth constraint as well as extend it to support heterogeneous client resources and make it robust to packet loss.
arXiv Detail & Related papers (2021-08-19T17:59:21Z)
Time-Correlated Sparsification for Communication-Efficient Federated Learning [6.746400031322727]
Federated learning (FL) enables multiple clients to collaboratively train a shared model without disclosing their local datasets. We introduce a novel time-correlated sparsification scheme, which seeks a certain correlation between the sparse representations used at consecutive iterations in FL. We show that TCS can achieve centralized training accuracy with 100 times sparsification, and up to 2000 times reduction in the communication load when employed together with quantization.
arXiv Detail & Related papers (2021-01-21T20:15:55Z)
Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers. We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z)
Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning. We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems. We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.