Related papers: A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

URL: http://arxiv.org/abs/2011.01697v1
Date: Tue, 3 Nov 2020 13:35:53 GMT
Title: A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!
Authors: Dmitry Kovalev and Anastasia Koloskova and Martin Jaggi and Peter Richtarik and Sebastian U. Stich
Abstract summary: Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
Score: 72.31332210635524
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. In many scenarios communication between devices is energy demanding and time consuming and forms the bottleneck of the entire system. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators to the communicated messages. By combining our scheme with a new variance reduction technique that progressively throughout the iterations reduces the adverse effect of the injected quantization noise, we obtain the first scheme that converges linearly on strongly convex decentralized problems while using compressed communication only. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline which does not perform any communication compression while still allowing for a significant compression factor which depends on the conditioning of the problem and the topology of the network. Our key theoretical findings are supported by numerical experiments.

Related papers

Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback. We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate. The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z)
Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity [92.1840862558718]
We introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors. We show that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers. We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
arXiv Detail & Related papers (2024-02-09T13:58:33Z)
Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging. We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z)
On Arbitrary Compression for Decentralized Consensus and Stochastic Optimization over Directed Networks [0.6526824510982799]
We propose an iterative-based algorithm that compresses messages according to a desired compression ratio. Contrary to existing literature allow arbitrary compression ratios. We show explicit convergence rates for decentralized optimization problems on smooth functions.
arXiv Detail & Related papers (2022-04-18T04:41:56Z)
Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees [115.08148491584997]
We present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2. New algorithms support bidirectional compressions, and also can be modified for setting with batches and for federated learning with partial participation of clients.
arXiv Detail & Related papers (2021-10-07T10:04:32Z)
Decentralized Composite Optimization with Compression [36.75785129001134]
We study the decentralized composite optimization problem with a potentially non-smooth component. A convergent underlineDecentralized algorithm with compression, Prox-LEAD, is proposed. Our theorems indicate that Prox-LEAD works with arbitrary compression precision, and it tremendously reduces the communication cost almost for free.
arXiv Detail & Related papers (2021-08-10T04:54:52Z)
PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers. Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)
Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms. Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent. underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
Communication-efficient Variance-reduced Stochastic Gradient Descent [0.0]
We consider the problem of communication efficient distributed optimization. In particular, we focus on the variance-reduced gradient and propose a novel approach to make it communication-efficient. Comprehensive theoretical and numerical analyses on real datasets reveal that our algorithm can significantly reduce the communication complexity, by as much as 95%, with almost no noticeable penalty.
arXiv Detail & Related papers (2020-03-10T13:22:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.