BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with
Communication Compression
- URL: http://arxiv.org/abs/2201.13320v1
- Date: Mon, 31 Jan 2022 16:14:09 GMT
- Title: BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with
Communication Compression
- Authors: Haoyu Zhao, Boyue Li, Zhize Li, Peter Richt\'arik, Yuejie Chi
- Abstract summary: Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications.
This paper proposes BEER, which adopts communication with gradient tracking, shows it converges at a faster rate.
- Score: 37.20712215269538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Communication efficiency has been widely recognized as the bottleneck for
large-scale decentralized machine learning applications in multi-agent or
federated environments. To tackle the communication bottleneck, there have been
many efforts to design communication-compressed algorithms for decentralized
nonconvex optimization, where the clients are only allowed to communicate a
small amount of quantized information (aka bits) with their neighbors over a
predefined graph topology. Despite significant efforts, the state-of-the-art
algorithm in the nonconvex setting still suffers from a slower rate of
convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart,
where $G$ measures the data heterogeneity across different clients, and $T$ is
the number of communication rounds. This paper proposes BEER, which adopts
communication compression with gradient tracking, and shows it converges at a
faster rate of $O(1/T)$. This significantly improves over the state-of-the-art
rate, by matching the rate without compression even under arbitrary data
heterogeneity. Numerical experiments are also provided to corroborate our
theory and confirm the practical superiority of BEER in the data heterogeneous
regime.
Related papers
- Towards Faster Decentralized Stochastic Optimization with Communication Compression [27.484212303346816]
We introduce MoTEF, a novel approach that integrates communication with Momentum Tracking and Error Feedback.
Our analysis demonstrates that MoTEF integrates most of the desired properties, and significantly existing methods under data.
arXiv Detail & Related papers (2024-05-30T14:51:57Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - Faster Rates for Compressed Federated Learning with Client-Variance
Reduction [23.169998435268504]
We show that COFIG and FRECON converge within $O(frac(1+omega)sqrtNSepsilon2)$ communication rounds.
In the convex setting COFIG converges within $O(frac(1+omega)sqrtNSepsilon2)$ communication rounds, which, to the best of our knowledge, is the first result.
arXiv Detail & Related papers (2021-12-24T16:28:18Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - CFedAvg: Achieving Efficient Communication and Fast Convergence in
Non-IID Federated Learning [8.702106020664612]
Federated learning (FL) is a prevailing distributed learning paradigm, where a large number of workers jointly learn a model without sharing their training data.
High communication costs could arise in FL due to deep-scale (deep) learning models and bandwidth-connected connections.
We introduce a distributed communication datasets called CFedAvg for FL with non-biased SNR-constrained compressors.
arXiv Detail & Related papers (2021-06-14T04:27:19Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.