Related papers: BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression

URL: http://arxiv.org/abs/2201.13320v1
Date: Mon, 31 Jan 2022 16:14:09 GMT
Title: BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression
Authors: Haoyu Zhao, Boyue Li, Zhize Li, Peter Richt\'arik, Yuejie Chi
Abstract summary: Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications. This paper proposes BEER, which adopts communication with gradient tracking, shows it converges at a faster rate.
Score: 37.20712215269538
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of BEER in the data heterogeneous regime.

Related papers

Accelerated Distributed Optimization with Compression and Error Feedback [22.94016026311574]
ADEF integrates Nesterov acceleration, contractive compression, error feedback, and gradient difference compression. We prove that ADEF achieves the first accelerated convergence rate for distributed optimization with contractive compression.
arXiv Detail & Related papers (2025-03-11T13:40:34Z)
Towards Faster Decentralized Stochastic Optimization with Communication Compression [27.484212303346816]
We introduce MoTEF, a novel approach that integrates communication with Momentum Tracking and Error Feedback. Our analysis demonstrates that MoTEF integrates most of the desired properties, and significantly existing methods under data.
arXiv Detail & Related papers (2024-05-30T14:51:57Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time. We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z)
Faster Rates for Compressed Federated Learning with Client-Variance Reduction [23.169998435268504]
We show that COFIG and FRECON converge within $O(frac(1+omega)sqrtNSepsilon2)$ communication rounds. In the convex setting COFIG converges within $O(frac(1+omega)sqrtNSepsilon2)$ communication rounds, which, to the best of our knowledge, is the first result.
arXiv Detail & Related papers (2021-12-24T16:28:18Z)
Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps. This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z)
Permutation Compressors for Provably Faster Distributed Nonconvex Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity. Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z)
CFedAvg: Achieving Efficient Communication and Fast Convergence in Non-IID Federated Learning [8.702106020664612]
Federated learning (FL) is a prevailing distributed learning paradigm, where a large number of workers jointly learn a model without sharing their training data. High communication costs could arise in FL due to deep-scale (deep) learning models and bandwidth-connected connections. We introduce a distributed communication datasets called CFedAvg for FL with non-biased SNR-constrained compressors.
arXiv Detail & Related papers (2021-06-14T04:27:19Z)
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.