Improving Accelerated Federated Learning with Compression and Importance
Sampling
- URL: http://arxiv.org/abs/2306.03240v1
- Date: Mon, 5 Jun 2023 20:50:36 GMT
- Title: Improving Accelerated Federated Learning with Compression and Importance
Sampling
- Authors: Micha{\l} Grudzie\'n, Grigory Malinovsky, Peter Richt\'arik
- Abstract summary: We present a complete method for Federated Learning that incorporates all necessary ingredients: Local Training, Compression, and Partial Participation.
We analyze the general sampling framework for partial participation and derive an importance sampling scheme, which leads to even better performance.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated Learning is a collaborative training framework that leverages
heterogeneous data distributed across a vast number of clients. Since it is
practically infeasible to request and process all clients during the
aggregation step, partial participation must be supported. In this setting, the
communication between the server and clients poses a major bottleneck. To
reduce communication loads, there are two main approaches: compression and
local steps. Recent work by Mishchenko et al. [2022] introduced the new
ProxSkip method, which achieves an accelerated rate using the local steps
technique. Follow-up works successfully combined local steps acceleration with
partial participation [Grudzie\'n et al., 2023, Condat et al. 2023] and
gradient compression [Condat et al. [2022]. In this paper, we finally present a
complete method for Federated Learning that incorporates all necessary
ingredients: Local Training, Compression, and Partial Participation. We obtain
state-of-the-art convergence guarantees in the considered setting. Moreover, we
analyze the general sampling framework for partial participation and derive an
importance sampling scheme, which leads to even better performance. We
experimentally demonstrate the advantages of the proposed method in practice.
Related papers
- Aiding Global Convergence in Federated Learning via Local Perturbation and Mutual Similarity Information [6.767885381740953]
Federated learning has emerged as a distributed optimization paradigm.
We propose a novel modified framework wherein each client locally performs a perturbed gradient step.
We show that our algorithm speeds convergence up to a margin of 30 global rounds compared with FedAvg.
arXiv Detail & Related papers (2024-10-07T23:14:05Z) - Cohort Squeeze: Beyond a Single Communication Round per Cohort in Cross-Device Federated Learning [51.560590617691005]
We investigate whether it is possible to squeeze more juice" out of each cohort than what is possible in a single communication round.
Our approach leads to up to 74% reduction in the total communication cost needed to train a FL model in the cross-device setting.
arXiv Detail & Related papers (2024-06-03T08:48:49Z) - Prune at the Clients, Not the Server: Accelerated Sparse Training in Federated Learning [56.21666819468249]
Resource constraints of clients and communication costs pose major problems for training large models in Federated Learning.
We introduce Sparse-ProxSkip, which combines training and acceleration in a sparse setting.
We demonstrate the good performance of Sparse-ProxSkip in extensive experiments.
arXiv Detail & Related papers (2024-05-31T05:21:12Z) - SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning [48.072207894076556]
Cross-device training is a subfield of learning where the number of clients can reach into the billions.
Standard approaches and local methods are prone to issues as crucial as cross-device similarity.
Our method is the first in its kind, that does not require the objective and provably benefits from clients having similar data.
arXiv Detail & Related papers (2024-05-30T15:07:30Z) - Federated Learning Can Find Friends That Are Advantageous [14.993730469216546]
In Federated Learning (FL), the distributed nature and heterogeneity of client data present both opportunities and challenges.
We introduce a novel algorithm that assigns adaptive aggregation weights to clients participating in FL training, identifying those with data distributions most conducive to a specific learning objective.
arXiv Detail & Related papers (2024-02-07T17:46:37Z) - TAMUNA: Doubly Accelerated Distributed Optimization with Local Training, Compression, and Partial Participation [53.84175614198885]
In distributed optimization and learning, several machines alternate between local computations in parallel and communication with a distant server.
We propose TAMUNA, the first algorithm for distributed optimization that leveraged the two strategies of local training and compression jointly and allows for partial participation.
arXiv Detail & Related papers (2023-02-20T08:37:44Z) - GradSkip: Communication-Accelerated Local Gradient Methods with Better
Computational Complexity [3.222802562733787]
We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication.
We prove that our modified method, for which the name GradSkip, converges linearly under the same assumptions.
arXiv Detail & Related papers (2022-10-28T20:59:06Z) - A Computation and Communication Efficient Method for Distributed
Nonconvex Problems in the Partial Participation Setting [58.59873548589766]
We present a new method that includes three key components: variance reduction, partial participation, and compressed communication.
We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting.
arXiv Detail & Related papers (2022-05-31T07:41:20Z) - On Second-order Optimization Methods for Federated Learning [59.787198516188425]
We evaluate the performance of several second-order distributed methods with local steps in the federated learning setting.
We propose a novel variant that uses second-order local information for updates and a global line search to counteract the resulting local specificity.
arXiv Detail & Related papers (2021-09-06T12:04:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.