Related papers: PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

URL: http://arxiv.org/abs/2008.01425v2
Date: Mon, 19 Oct 2020 15:07:50 GMT
Title: PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning
Authors: Thijs Vogels and Sai Praneeth Karimireddy and Martin Jaggi
Abstract summary: We introduce a simple algorithm that directly compresses the model differences between neighboring workers. Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
Score: 62.440827696638664
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Lossy gradient compression has become a practical tool to overcome the communication bottleneck in centrally coordinated distributed training of machine learning models. However, algorithms for decentralized training with compressed communication over arbitrary connected networks have been more complicated, requiring additional memory and hyperparameters. We introduce a simple algorithm that directly compresses the model differences between neighboring workers using low-rank linear compressors applied on model differences. Inspired by the PowerSGD algorithm for centralized deep learning, this algorithm uses power iteration steps to maximize the information transferred per bit. We prove that our method requires no additional hyperparameters, converges faster than prior methods, and is asymptotically independent of both the network and the compression. Out of the box, these compressors perform on par with state-of-the-art tuned compression algorithms in a series of deep learning benchmarks.

Related papers

Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback. We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate. The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z)
AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression [11.290935303784208]
AdaGossip is a novel technique that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents. Our experiments show that the proposed method achieves superior performance compared to the current state-of-the-art method for decentralized learning with communication compression.
arXiv Detail & Related papers (2024-04-09T00:43:45Z)
Accelerating Distributed Deep Learning using Lossless Homomorphic Compression [17.654138014999326]
We introduce a novel compression algorithm that effectively merges worker-level compression with in-network aggregation. We show up to a 6.33$times$ improvement in aggregation throughput and a 3.74$times$ increase in per-iteration training speed.
arXiv Detail & Related papers (2024-02-12T09:57:47Z)
Supervised Compression for Resource-constrained Edge Computing Systems [26.676557573171618]
Full-scale deep neural networks are often too resource-intensive in terms of energy and storage. This paper adopts ideas from knowledge distillation and neural image compression to compress intermediate feature representations more efficiently. It achieves better supervised rate-distortion performance while also maintaining smaller end-to-end latency.
arXiv Detail & Related papers (2021-08-21T11:10:29Z)
On Effects of Compression with Hyperdimensional Computing in Distributed Randomized Neural Networks [6.25118865553438]
We propose a model for distributed classification based on randomized neural networks and hyperdimensional computing. In this work, we propose a more flexible approach to compression and compare it to conventional compression algorithms, dimensionality reduction, and quantization techniques.
arXiv Detail & Related papers (2021-06-17T22:02:40Z)
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC) Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z)
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z)
Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers. We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z)
Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms. Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent. underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.