STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal
Sample and Communication Complexities for Federated Learning
- URL: http://arxiv.org/abs/2106.10435v1
- Date: Sat, 19 Jun 2021 06:13:45 GMT
- Title: STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal
Sample and Communication Complexities for Federated Learning
- Authors: Prashant Khanduri, Pranay Sharma, Haibo Yang, Mingyi Hong, Jia Liu,
Ketan Rajawat, and Pramod K. Varshney
- Abstract summary: Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data.
It is not clear how to choose the WNs' minimum update directions, the first minibatch sizes, and the local update frequency.
We show that there is a trade-off curve between local update frequencies and local mini sizes, on which the above complexities can be maintained.
- Score: 58.6792963686231
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated Learning (FL) refers to the paradigm where multiple worker nodes
(WNs) build a joint model by using local data. Despite extensive research, for
a generic non-convex FL problem, it is not clear, how to choose the WNs' and
the server's update directions, the minibatch sizes, and the local update
frequency, so that the WNs use the minimum number of samples and communication
rounds to achieve the desired solution. This work addresses the above question
and considers a class of stochastic algorithms where the WNs perform a few
local updates before communication. We show that when both the WN's and the
server's directions are chosen based on a stochastic momentum estimator, the
algorithm requires $\tilde{\mathcal{O}}(\epsilon^{-3/2})$ samples and
$\tilde{\mathcal{O}}(\epsilon^{-1})$ communication rounds to compute an
$\epsilon$-stationary solution. To the best of our knowledge, this is the first
FL algorithm that achieves such {\it near-optimal} sample and communication
complexities simultaneously. Further, we show that there is a trade-off curve
between local update frequencies and local minibatch sizes, on which the above
sample and communication complexities can be maintained. Finally, we show that
for the classical FedAvg (a.k.a. Local SGD, which is a momentum-less special
case of the STEM), a similar trade-off curve exists, albeit with worse sample
and communication complexities. Our insights on this trade-off provides
guidelines for choosing the four important design elements for FL algorithms,
the update frequency, directions, and minibatch sizes to achieve the best
performance.
Related papers
- A Framework for testing Federated Learning algorithms using an edge-like environment [0.0]
Federated Learning (FL) is a machine learning paradigm in which many clients cooperatively train a single centralized model while keeping their data private and decentralized.
It is non-trivial to accurately evaluate the contributions of local models in global centralized model aggregation.
This is an example of a major challenge in FL, commonly known as data imbalance or class imbalance.
In this work, a framework is proposed and implemented to assess FL algorithms in a more easy and scalable way.
arXiv Detail & Related papers (2024-07-17T19:52:53Z) - Distributed Extra-gradient with Optimal Complexity and Communication
Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors.
Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient.
We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - Walk for Learning: A Random Walk Approach for Federated Learning from
Heterogeneous Data [17.978941229970886]
We focus on Federated Learning (FL) as a canonical application.
One of the main challenges of FL is the communication bottleneck between the nodes and the parameter server.
We present an adaptive random walk learning algorithm.
arXiv Detail & Related papers (2022-06-01T19:53:24Z) - Communication-Efficient Sampling for Distributed Training of Graph
Convolutional Networks [3.075766050800645]
Training Graph Convolutional Networks (GCNs) is expensive as it needs to aggregate data from neighboring nodes.
Previous works have proposed various neighbor sampling methods that estimate the aggregation result based on a small number of sampled neighbors.
We present an algorithm that determines the local sampling probabilities and makes sure our skewed neighbor sampling does not affect much the convergence of the training.
arXiv Detail & Related papers (2021-01-19T16:12:44Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Delay Minimization for Federated Learning Over Wireless Communication
Networks [172.42768672943365]
The problem of delay computation for federated learning (FL) over wireless communication networks is investigated.
A bisection search algorithm is proposed to obtain the optimal solution.
Simulation results show that the proposed algorithm can reduce delay by up to 27.3% compared to conventional FL methods.
arXiv Detail & Related papers (2020-07-05T19:00:07Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.