GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity
- URL: http://arxiv.org/abs/2210.16402v3
- Date: Mon, 09 Jun 2025 15:48:07 GMT
- Title: GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity
- Authors: Artavazd Maranjyan, Mher Safaryan, Peter Richtárik,
- Abstract summary: We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing clients to perform multiple local gradient-type training steps before communication.<n>In particular, we prove that our modified method, GradSkip, converges linearly under the same assumptions and has the same accelerated communication complexity.
- Score: 54.585248253601314
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing clients to perform multiple local gradient-type training steps before communication. In a recent breakthrough, Mishchenko et al. (2022) proved that local training, when properly executed, leads to provable communication acceleration, and this holds in the strongly convex regime without relying on any data similarity assumptions. However, their ProxSkip method requires all clients to take the same number of local training steps in each communication round. We propose a redesign of the ProxSkip method, allowing clients with ``less important'' data to get away with fewer local training steps without impacting the overall communication complexity of the method. In particular, we prove that our modified method, GradSkip, converges linearly under the same assumptions and has the same accelerated communication complexity, while the number of local gradient steps can be reduced relative to a local condition number. We further generalize our method by extending the randomness of probabilistic alternations to arbitrary unbiased compression operators and by considering a generic proximable regularizer. This generalization, which we call GradSkip+, recovers several related methods in the literature as special cases. Finally, we present an empirical study on carefully designed toy problems that confirm our theoretical claims.
Related papers
- Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning [50.382793324572845]
Distributed computing involves communication between devices, which requires solving two key problems: efficiency and privacy.
In this paper, we analyze a new method that incorporates the ideas of using data similarity and clients sampling.
To address privacy concerns, we apply the technique of additional noise and analyze its impact on the convergence of the proposed method.
arXiv Detail & Related papers (2024-09-22T00:49:10Z) - Cohort Squeeze: Beyond a Single Communication Round per Cohort in Cross-Device Federated Learning [51.560590617691005]
We investigate whether it is possible to squeeze more juice" out of each cohort than what is possible in a single communication round.
Our approach leads to up to 74% reduction in the total communication cost needed to train a FL model in the cross-device setting.
arXiv Detail & Related papers (2024-06-03T08:48:49Z) - Gradient-Congruity Guided Federated Sparse Training [31.793271982853188]
Federated learning (FL) is a distributed machine learning technique that facilitates this process while preserving data privacy.
FL also faces challenges such as high computational and communication costs regarding resource-constrained devices.
We propose the Gradient-Congruity Guided Federated Sparse Training (FedSGC), a novel method that integrates dynamic sparse training and gradient congruity inspection into federated learning framework.
arXiv Detail & Related papers (2024-05-02T11:29:48Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Improving Accelerated Federated Learning with Compression and Importance
Sampling [0.0]
We present a complete method for Federated Learning that incorporates all necessary ingredients: Local Training, Compression, and Partial Participation.
We analyze the general sampling framework for partial participation and derive an importance sampling scheme, which leads to even better performance.
arXiv Detail & Related papers (2023-06-05T20:50:36Z) - TAMUNA: Doubly Accelerated Distributed Optimization with Local Training, Compression, and Partial Participation [53.84175614198885]
In distributed optimization and learning, several machines alternate between local computations in parallel and communication with a distant server.
We propose TAMUNA, the first algorithm for distributed optimization that leveraged the two strategies of local training and compression jointly and allows for partial participation.
arXiv Detail & Related papers (2023-02-20T08:37:44Z) - Federated Minimax Optimization with Client Heterogeneity [11.558008138030845]
Minimax computation has seen a surge in interest with the advent modern applications such as GANs.
We propose a general federated minimax framework that subsumes settings and existing methods like Local SGDA.
arXiv Detail & Related papers (2023-02-08T18:33:55Z) - Communication Acceleration of Local Gradient Methods via an Accelerated
Primal-Dual Algorithm with Inexact Prox [11.564643329398438]
We propose an alternative algorithm which obtains the same communication acceleration as Mishchenko et al (2022)
Our approach is based on the celebrated method of Chambolle and Pock (2011), with several nontrivial modifications.
Our method can be applied to optimization over a connected network, and we obtain theoretical improvements here as well.
arXiv Detail & Related papers (2022-07-08T15:24:13Z) - DisPFL: Towards Communication-Efficient Personalized Federated Learning
via Decentralized Sparse Training [84.81043932706375]
We propose a novel personalized federated learning framework in a decentralized (peer-to-peer) communication protocol named Dis-PFL.
Dis-PFL employs personalized sparse masks to customize sparse local models on the edge.
We demonstrate that our method can easily adapt to heterogeneous local clients with varying computation complexities.
arXiv Detail & Related papers (2022-06-01T02:20:57Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - Communication-Compressed Adaptive Gradient Method for Distributed
Nonconvex Optimization [21.81192774458227]
One of the major bottlenecks is the large communication cost between the central server and the local workers.
Our proposed distributed learning framework features an effective gradient gradient compression strategy.
arXiv Detail & Related papers (2021-11-01T04:54:55Z) - FedChain: Chained Algorithms for Near-Optimal Communication Cost in
Federated Learning [24.812767482563878]
Federated learning (FL) aims to minimize the communication complexity of training a model over heterogeneous data distributed across many clients.
We propose FedChain, an algorithmic framework that combines the strengths of local methods and global methods to achieve fast convergence in terms of R.
arXiv Detail & Related papers (2021-08-16T02:57:06Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - Federated Learning with Compression: Unified Analysis and Sharp
Guarantees [39.092596142018195]
Communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices.
Two notable trends to deal with the communication overhead of federated compression and computation are unreliable compression and heterogeneous communication.
We analyze their convergence in both homogeneous and heterogeneous data distribution settings.
arXiv Detail & Related papers (2020-07-02T14:44:07Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - From Local SGD to Local Fixed-Point Methods for Federated Learning [17.04886864943171]
We consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting.
We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations.
arXiv Detail & Related papers (2020-04-03T09:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.