Related papers: Improving Accelerated Federated Learning with Compression and Importance Sampling

Improving Accelerated Federated Learning with Compression and Importance Sampling

URL: http://arxiv.org/abs/2306.03240v1
Date: Mon, 5 Jun 2023 20:50:36 GMT
Title: Improving Accelerated Federated Learning with Compression and Importance Sampling
Authors: Micha{\l} Grudzie\'n, Grigory Malinovsky, Peter Richt\'arik
Abstract summary: We present a complete method for Federated Learning that incorporates all necessary ingredients: Local Training, Compression, and Partial Participation. We analyze the general sampling framework for partial participation and derive an importance sampling scheme, which leads to even better performance.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated Learning is a collaborative training framework that leverages heterogeneous data distributed across a vast number of clients. Since it is practically infeasible to request and process all clients during the aggregation step, partial participation must be supported. In this setting, the communication between the server and clients poses a major bottleneck. To reduce communication loads, there are two main approaches: compression and local steps. Recent work by Mishchenko et al. [2022] introduced the new ProxSkip method, which achieves an accelerated rate using the local steps technique. Follow-up works successfully combined local steps acceleration with partial participation [Grudzie\'n et al., 2023, Condat et al. 2023] and gradient compression [Condat et al. [2022]. In this paper, we finally present a complete method for Federated Learning that incorporates all necessary ingredients: Local Training, Compression, and Partial Participation. We obtain state-of-the-art convergence guarantees in the considered setting. Moreover, we analyze the general sampling framework for partial participation and derive an importance sampling scheme, which leads to even better performance. We experimentally demonstrate the advantages of the proposed method in practice.

Related papers

Aiding Global Convergence in Federated Learning via Local Perturbation and Mutual Similarity Information [6.767885381740953]
Federated learning has emerged as a distributed optimization paradigm. We propose a novel modified framework wherein each client locally performs a perturbed gradient step. We show that our algorithm speeds convergence up to a margin of 30 global rounds compared with FedAvg.
arXiv Detail & Related papers (2024-10-07T23:14:05Z)
Cohort Squeeze: Beyond a Single Communication Round per Cohort in Cross-Device Federated Learning [51.560590617691005]
We investigate whether it is possible to squeeze more juice" out of each cohort than what is possible in a single communication round. Our approach leads to up to 74% reduction in the total communication cost needed to train a FL model in the cross-device setting.
arXiv Detail & Related papers (2024-06-03T08:48:49Z)
Prune at the Clients, Not the Server: Accelerated Sparse Training in Federated Learning [56.21666819468249]
Resource constraints of clients and communication costs pose major problems for training large models in Federated Learning. We introduce Sparse-ProxSkip, which combines training and acceleration in a sparse setting. We demonstrate the good performance of Sparse-ProxSkip in extensive experiments.
arXiv Detail & Related papers (2024-05-31T05:21:12Z)
SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning [48.072207894076556]
Cross-device training is a subfield of learning where the number of clients can reach into the billions. Standard approaches and local methods are prone to issues as crucial as cross-device similarity. Our method is the first in its kind, that does not require the objective and provably benefits from clients having similar data.
arXiv Detail & Related papers (2024-05-30T15:07:30Z)
Federated Learning Can Find Friends That Are Advantageous [14.993730469216546]
In Federated Learning (FL), the distributed nature and heterogeneity of client data present both opportunities and challenges. We introduce a novel algorithm that assigns adaptive aggregation weights to clients participating in FL training, identifying those with data distributions most conducive to a specific learning objective.
arXiv Detail & Related papers (2024-02-07T17:46:37Z)
Accelerating Hybrid Federated Learning Convergence under Partial Participation [14.427308569399957]
Federated Learning (FL) involves a group of clients with decentralized data who collaborate to learn a common model. In realistic scenarios, the server may be able to collect a small amount of data that approximately mimics the population distribution. We propose a new algorithm called FedCLG, which investigates the two-fold role of the server in hybrid FL.
arXiv Detail & Related papers (2023-04-10T19:13:14Z)
TAMUNA: Doubly Accelerated Distributed Optimization with Local Training, Compression, and Partial Participation [53.84175614198885]
In distributed optimization and learning, several machines alternate between local computations in parallel and communication with a distant server. We propose TAMUNA, the first algorithm for distributed optimization that leveraged the two strategies of local training and compression jointly and allows for partial participation.
arXiv Detail & Related papers (2023-02-20T08:37:44Z)
GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity [3.222802562733787]
We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication. We prove that our modified method, for which the name GradSkip, converges linearly under the same assumptions.
arXiv Detail & Related papers (2022-10-28T20:59:06Z)
A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting [58.59873548589766]
We present a new method that includes three key components: variance reduction, partial participation, and compressed communication. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting.
arXiv Detail & Related papers (2022-05-31T07:41:20Z)
On Second-order Optimization Methods for Federated Learning [59.787198516188425]
We evaluate the performance of several second-order distributed methods with local steps in the federated learning setting. We propose a novel variant that uses second-order local information for updates and a global line search to counteract the resulting local specificity.
arXiv Detail & Related papers (2021-09-06T12:04:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.