Convergence and Privacy of Decentralized Nonconvex Optimization with
Gradient Clipping and Communication Compression
- URL: http://arxiv.org/abs/2305.09896v1
- Date: Wed, 17 May 2023 02:13:18 GMT
- Title: Convergence and Privacy of Decentralized Nonconvex Optimization with
Gradient Clipping and Communication Compression
- Authors: Boyue Li, Yuejie Chi
- Abstract summary: This paper takes a first step to understand the role of a popular strategy in decentralized non communication optimization with compression.
We propose two variants of gradient clipping added before or after taking a mini-batch perturbation.
- Score: 31.161598424963934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving communication efficiency in decentralized machine learning has been
attracting significant attention, with communication compression recognized as
an effective technique in algorithm design. This paper takes a first step to
understand the role of gradient clipping, a popular strategy in practice, in
decentralized nonconvex optimization with communication compression. We propose
PORTER, which considers two variants of gradient clipping added before or after
taking a mini-batch of stochastic gradients, where the former variant PORTER-DP
allows local differential privacy analysis with additional Gaussian
perturbation, and the latter variant PORTER-GC helps to stabilize training. We
develop a novel analysis framework that establishes their convergence
guarantees without assuming the stringent bounded gradient assumption. To the
best of our knowledge, our work provides the first convergence analysis for
decentralized nonconvex optimization with gradient clipping and communication
compression, highlighting the trade-offs between convergence rate, compression
ratio, network connectivity, and privacy.
Related papers
- Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity [92.1840862558718]
We introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors.
We show that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers.
We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
arXiv Detail & Related papers (2024-02-09T13:58:33Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - On Arbitrary Compression for Decentralized Consensus and Stochastic
Optimization over Directed Networks [0.6526824510982799]
We propose an iterative-based algorithm that compresses messages according to a desired compression ratio.
Contrary to existing literature allow arbitrary compression ratios.
We show explicit convergence rates for decentralized optimization problems on smooth functions.
arXiv Detail & Related papers (2022-04-18T04:41:56Z) - Communication-Efficient Distributed SGD with Compressed Sensing [24.33697801661053]
We consider large scale distributed optimization over a set of edge devices connected to a central server.
Inspired by recent advances in federated learning, we propose a distributed gradient descent (SGD) type algorithm that exploits the sparsity of the gradient, when possible, to reduce communication burden.
We conduct theoretical analysis on the convergence of our algorithm in the presence of noise perturbation incurred by the communication channels, and also conduct numerical experiments to corroborate its effectiveness.
arXiv Detail & Related papers (2021-12-15T02:10:45Z) - Communication-Efficient Federated Learning via Quantized Compressed
Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server.
Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression.
We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z) - Communication-Compressed Adaptive Gradient Method for Distributed
Nonconvex Optimization [21.81192774458227]
One of the major bottlenecks is the large communication cost between the central server and the local workers.
Our proposed distributed learning framework features an effective gradient gradient compression strategy.
arXiv Detail & Related papers (2021-11-01T04:54:55Z) - Compressed Gradient Tracking for Decentralized Optimization Over General Directed Networks [17.49477125920901]
We propose two communication efficient decentralized optimization algorithms over a general directed multi-agent network.
The first algorithm combines the gradient tracking Push-Pull method with communication compression.
The second algorithm is a broadcast-like version of CPP (B- CPP) and it also achieves linear convergence rate under the same conditions on the objective functions.
arXiv Detail & Related papers (2021-06-14T08:53:30Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.