Linear Convergent Decentralized Optimization with Compression
- URL: http://arxiv.org/abs/2007.00232v2
- Date: Thu, 18 Mar 2021 20:46:05 GMT
- Title: Linear Convergent Decentralized Optimization with Compression
- Authors: Xiaorui Liu, Yao Li, Rongrong Wang, Jiliang Tang, Ming Yan
- Abstract summary: Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
- Score: 50.44269451541387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Communication compression has become a key strategy to speed up distributed
optimization. However, existing decentralized algorithms with compression
mainly focus on compressing DGD-type algorithms. They are unsatisfactory in
terms of convergence rate, stability, and the capability to handle
heterogeneous data. Motivated by primal-dual algorithms, this paper proposes
the first \underline{L}in\underline{EA}r convergent \underline{D}ecentralized
algorithm with compression, LEAD. Our theory describes the coupled dynamics of
the inexact primal and dual update as well as compression error, and we provide
the first consensus error bound in such settings without assuming bounded
gradients. Experiments on convex problems validate our theoretical analysis,
and empirical study on deep neural nets shows that LEAD is applicable to
non-convex problems.
Related papers
- Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Lower Bounds and Accelerated Algorithms in Distributed Stochastic
Optimization with Communication Compression [31.107056382542417]
Communication compression is an essential strategy for alleviating communication overhead.
We propose NEOLITHIC, a nearly optimal algorithm for compression under mild conditions.
arXiv Detail & Related papers (2023-05-12T17:02:43Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with
Communication Compression [33.217552987061474]
Communication compression is one of the most effective means of reducing communication.
Recent advances in distributed optimization and learning have shown that communication compression is one of the most effective means of reducing communication.
arXiv Detail & Related papers (2022-06-08T03:36:34Z) - On Arbitrary Compression for Decentralized Consensus and Stochastic
Optimization over Directed Networks [0.6526824510982799]
We propose an iterative-based algorithm that compresses messages according to a desired compression ratio.
Contrary to existing literature allow arbitrary compression ratios.
We show explicit convergence rates for decentralized optimization problems on smooth functions.
arXiv Detail & Related papers (2022-04-18T04:41:56Z) - Decentralized Composite Optimization with Compression [36.75785129001134]
We study the decentralized composite optimization problem with a potentially non-smooth component.
A convergent underlineDecentralized algorithm with compression, Prox-LEAD, is proposed.
Our theorems indicate that Prox-LEAD works with arbitrary compression precision, and it tremendously reduces the communication cost almost for free.
arXiv Detail & Related papers (2021-08-10T04:54:52Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.