GradSkip: Communication-Accelerated Local Gradient Methods with Better
Computational Complexity
- URL: http://arxiv.org/abs/2210.16402v2
- Date: Tue, 30 May 2023 18:49:36 GMT
- Title: GradSkip: Communication-Accelerated Local Gradient Methods with Better
Computational Complexity
- Authors: Artavazd Maranjyan, Mher Safaryan, Peter Richt\'arik
- Abstract summary: We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication.
We prove that our modified method, for which the name GradSkip, converges linearly under the same assumptions.
- Score: 3.222802562733787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a class of distributed optimization algorithms that aim to alleviate
high communication costs by allowing the clients to perform multiple local
gradient-type training steps prior to communication. While methods of this type
have been studied for about a decade, the empirically observed acceleration
properties of local training eluded all attempts at theoretical understanding.
In a recent breakthrough, Mishchenko et al. (ICML 2022) proved that local
training, when properly executed, leads to provable communication acceleration,
and this holds in the strongly convex regime without relying on any data
similarity assumptions. However, their method ProxSkip requires all clients to
take the same number of local training steps in each communication round.
Inspired by a common sense intuition, we start our investigation by
conjecturing that clients with ``less important'' data should be able to get
away with fewer local training steps without this impacting the overall
communication complexity of the method. It turns out that this intuition is
correct: we managed to redesign the original ProxSkip method to achieve this.
In particular, we prove that our modified method, for which coin the name
GradSkip, converges linearly under the same assumptions, has the same
accelerated communication complexity, while the number of local gradient steps
can be reduced relative to a local condition number. We further generalize our
method by extending the randomness of probabilistic alternations to arbitrary
unbiased compression operators and considering a generic proximable
regularizer. This generalization, which we call GradSkip+, recovers several
related methods in the literature as special cases. Finally, we present an
empirical study on carefully designed toy problems that confirm our theoretical
claims.
Related papers
- Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning [50.382793324572845]
Distributed computing involves communication between devices, which requires solving two key problems: efficiency and privacy.
In this paper, we analyze a new method that incorporates the ideas of using data similarity and clients sampling.
To address privacy concerns, we apply the technique of additional noise and analyze its impact on the convergence of the proposed method.
arXiv Detail & Related papers (2024-09-22T00:49:10Z) - Cohort Squeeze: Beyond a Single Communication Round per Cohort in Cross-Device Federated Learning [51.560590617691005]
We investigate whether it is possible to squeeze more juice" out of each cohort than what is possible in a single communication round.
Our approach leads to up to 74% reduction in the total communication cost needed to train a FL model in the cross-device setting.
arXiv Detail & Related papers (2024-06-03T08:48:49Z) - Gradient-Congruity Guided Federated Sparse Training [31.793271982853188]
Federated learning (FL) is a distributed machine learning technique that facilitates this process while preserving data privacy.
FL also faces challenges such as high computational and communication costs regarding resource-constrained devices.
We propose the Gradient-Congruity Guided Federated Sparse Training (FedSGC), a novel method that integrates dynamic sparse training and gradient congruity inspection into federated learning framework.
arXiv Detail & Related papers (2024-05-02T11:29:48Z) - Improving Accelerated Federated Learning with Compression and Importance
Sampling [0.0]
We present a complete method for Federated Learning that incorporates all necessary ingredients: Local Training, Compression, and Partial Participation.
We analyze the general sampling framework for partial participation and derive an importance sampling scheme, which leads to even better performance.
arXiv Detail & Related papers (2023-06-05T20:50:36Z) - TAMUNA: Doubly Accelerated Distributed Optimization with Local Training, Compression, and Partial Participation [53.84175614198885]
In distributed optimization and learning, several machines alternate between local computations in parallel and communication with a distant server.
We propose TAMUNA, the first algorithm for distributed optimization that leveraged the two strategies of local training and compression jointly and allows for partial participation.
arXiv Detail & Related papers (2023-02-20T08:37:44Z) - Communication Acceleration of Local Gradient Methods via an Accelerated
Primal-Dual Algorithm with Inexact Prox [11.564643329398438]
We propose an alternative algorithm which obtains the same communication acceleration as Mishchenko et al (2022)
Our approach is based on the celebrated method of Chambolle and Pock (2011), with several nontrivial modifications.
Our method can be applied to optimization over a connected network, and we obtain theoretical improvements here as well.
arXiv Detail & Related papers (2022-07-08T15:24:13Z) - DisPFL: Towards Communication-Efficient Personalized Federated Learning
via Decentralized Sparse Training [84.81043932706375]
We propose a novel personalized federated learning framework in a decentralized (peer-to-peer) communication protocol named Dis-PFL.
Dis-PFL employs personalized sparse masks to customize sparse local models on the edge.
We demonstrate that our method can easily adapt to heterogeneous local clients with varying computation complexities.
arXiv Detail & Related papers (2022-06-01T02:20:57Z) - FedChain: Chained Algorithms for Near-Optimal Communication Cost in
Federated Learning [24.812767482563878]
Federated learning (FL) aims to minimize the communication complexity of training a model over heterogeneous data distributed across many clients.
We propose FedChain, an algorithmic framework that combines the strengths of local methods and global methods to achieve fast convergence in terms of R.
arXiv Detail & Related papers (2021-08-16T02:57:06Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - From Local SGD to Local Fixed-Point Methods for Federated Learning [17.04886864943171]
We consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting.
We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations.
arXiv Detail & Related papers (2020-04-03T09:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.