Related papers: Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees

Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees

URL: http://arxiv.org/abs/2507.08784v2
Date: Sun, 20 Jul 2025 14:53:50 GMT
Title: Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees
Authors: Chuyan Chen, Yutong He, Pengrui Li, Weichen Jia, Kun Yuan,
Abstract summary: We propose the first Greedy Low-Rank compression algorithm for distributed learning with rigorous convergence guarantees.<n>We prove that GreedyLore achieves a convergence rate of $mathcalO(sigma/sqrtNT + 1/T)$ under standards such as MSGD and Adam--marking the first linear speedup convergence rate for low-rank gradient compression.
Score: 13.806112971330482
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distributed optimization is pivotal for large-scale signal processing and machine learning, yet communication overhead remains a major bottleneck. Low-rank gradient compression, in which the transmitted gradients are approximated by low-rank matrices to reduce communication, offers a promising remedy. Existing methods typically adopt either randomized or greedy compression strategies: randomized approaches project gradients onto randomly chosen subspaces, introducing high variance and degrading empirical performance; greedy methods select the most informative subspaces, achieving strong empirical results but lacking convergence guarantees. To address this gap, we propose GreedyLore--the first Greedy Low-Rank gradient compression algorithm for distributed learning with rigorous convergence guarantees. GreedyLore incorporates error feedback to correct the bias introduced by greedy compression and introduces a semi-lazy subspace update that ensures the compression operator remains contractive throughout all iterations. With these techniques, we prove that GreedyLore achieves a convergence rate of $\mathcal{O}(\sigma/\sqrt{NT} + 1/T)$ under standard optimizers such as MSGD and Adam--marking the first linear speedup convergence rate for low-rank gradient compression. Extensive experiments are conducted to validate our theoretical findings.

Related papers

Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness [50.78508362183774]
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance.<n>Most require the Lipschitz condition, which is often not met in common machine learning schemes.
arXiv Detail & Related papers (2025-07-11T15:36:48Z)
Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z)
Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression [39.65082601416051]
Communication compression is an essential strategy for alleviating communication overhead.<n>We propose NEOLITHIC, a nearly optimal algorithm for compression under mild conditions.
arXiv Detail & Related papers (2023-05-12T17:02:43Z)
$z$-SignFedAvg: A Unified Stochastic Sign-based Compression for Federated Learning [14.363110221372274]
Federated Learning (FL) is a promising privacy-preserving distributed learning paradigm. FL suffers from high communication cost when training large-scale machine learning models. We propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression.
arXiv Detail & Related papers (2023-02-06T06:54:49Z)
Federated Optimization Algorithms with Random Reshuffling and Gradient Compression [2.7554288121906296]
We provide the first analysis of methods with gradient compression and without-replacement sampling. We show how to reduce the variance coming from gradient quantization through the use of control iterates. We outline several settings in which they improve upon existing algorithms.
arXiv Detail & Related papers (2022-06-14T17:36:47Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
Communication-Efficient Federated Learning via Quantized Compressed Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z)
BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify Communication-Efficient Federated Learning [24.016538592246377]
Communication between workers and master node to collect local gradients is a key bottleneck in a large-scale learning system. In this work, we investigate the problem of Byzantine-robust federated learning with compression, where the attacks from Byzantine workers can be arbitrarily malicious. In light of this observation, we propose to jointly reduce the noise and compression noise so as to improve the Byzantine-robustness.
arXiv Detail & Related papers (2021-04-14T08:16:03Z)
Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms. Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent. underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
On Biased Compression for Distributed Learning [55.89300593805943]
We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We propose several new biased compressors with promising theoretical guarantees and practical performance.
arXiv Detail & Related papers (2020-02-27T19:52:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.