PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning
- URL: http://arxiv.org/abs/2008.01425v2
- Date: Mon, 19 Oct 2020 15:07:50 GMT
- Title: PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning
- Authors: Thijs Vogels and Sai Praneeth Karimireddy and Martin Jaggi
- Abstract summary: We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
- Score: 62.440827696638664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lossy gradient compression has become a practical tool to overcome the
communication bottleneck in centrally coordinated distributed training of
machine learning models. However, algorithms for decentralized training with
compressed communication over arbitrary connected networks have been more
complicated, requiring additional memory and hyperparameters. We introduce a
simple algorithm that directly compresses the model differences between
neighboring workers using low-rank linear compressors applied on model
differences. Inspired by the PowerSGD algorithm for centralized deep learning,
this algorithm uses power iteration steps to maximize the information
transferred per bit. We prove that our method requires no additional
hyperparameters, converges faster than prior methods, and is asymptotically
independent of both the network and the compression. Out of the box, these
compressors perform on par with state-of-the-art tuned compression algorithms
in a series of deep learning benchmarks.
Related papers
- Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression [11.290935303784208]
AdaGossip is a novel technique that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents.
Our experiments show that the proposed method achieves superior performance compared to the current state-of-the-art method for decentralized learning with communication compression.
arXiv Detail & Related papers (2024-04-09T00:43:45Z) - Accelerating Distributed Deep Learning using Lossless Homomorphic
Compression [17.654138014999326]
We introduce a novel compression algorithm that effectively merges worker-level compression with in-network aggregation.
We show up to a 6.33$times$ improvement in aggregation throughput and a 3.74$times$ increase in per-iteration training speed.
arXiv Detail & Related papers (2024-02-12T09:57:47Z) - Supervised Compression for Resource-constrained Edge Computing Systems [26.676557573171618]
Full-scale deep neural networks are often too resource-intensive in terms of energy and storage.
This paper adopts ideas from knowledge distillation and neural image compression to compress intermediate feature representations more efficiently.
It achieves better supervised rate-distortion performance while also maintaining smaller end-to-end latency.
arXiv Detail & Related papers (2021-08-21T11:10:29Z) - On Effects of Compression with Hyperdimensional Computing in Distributed
Randomized Neural Networks [6.25118865553438]
We propose a model for distributed classification based on randomized neural networks and hyperdimensional computing.
In this work, we propose a more flexible approach to compression and compare it to conventional compression algorithms, dimensionality reduction, and quantization techniques.
arXiv Detail & Related papers (2021-06-17T22:02:40Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models.
In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers.
We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.