SAGDA: Achieving $\mathcal{O}(\epsilon^{-2})$ Communication Complexity
in Federated Min-Max Learning
- URL: http://arxiv.org/abs/2210.00611v1
- Date: Sun, 2 Oct 2022 20:04:50 GMT
- Title: SAGDA: Achieving $\mathcal{O}(\epsilon^{-2})$ Communication Complexity
in Federated Min-Max Learning
- Authors: Haibo Yang, Zhuqing Liu, Xin Zhang, Jia Liu
- Abstract summary: We show that SAGDA achieves a linear speedup in terms of both the number of clients and local update steps.
We also advance the current understanding on communication complexity of the standard FSGDA method for federated min-max learning.
- Score: 9.001405567602745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To lower the communication complexity of federated min-max learning, a
natural approach is to utilize the idea of infrequent communications (through
multiple local updates) same as in conventional federated learning. However,
due to the more complicated inter-outer problem structure in federated min-max
learning, theoretical understandings of communication complexity for federated
min-max learning with infrequent communications remain very limited in the
literature. This is particularly true for settings with non-i.i.d. datasets and
partial client participation. To address this challenge, in this paper, we
propose a new algorithmic framework called stochastic sampling averaging
gradient descent ascent (SAGDA), which i) assembles stochastic gradient
estimators from randomly sampled clients as control variates and ii) leverages
two learning rates on both server and client sides. We show that SAGDA achieves
a linear speedup in terms of both the number of clients and local update steps,
which yields an $\mathcal{O}(\epsilon^{-2})$ communication complexity that is
orders of magnitude lower than the state of the art. Interestingly, by noting
that the standard federated stochastic gradient descent ascent (FSGDA) is in
fact a control-variate-free special version of SAGDA, we immediately arrive at
an $\mathcal{O}(\epsilon^{-2})$ communication complexity result for FSGDA.
Therefore, through the lens of SAGDA, we also advance the current understanding
on communication complexity of the standard FSGDA method for federated min-max
learning.
Related papers
- Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets [23.640506243685863]
In practical federated learning (FLNGA) the presence of malicious attacks and data heterogeneity often introduces biases into the learning process.
We propose a Normalized Gradient unit (Fed-M) model which normalizes uploaded local gradients to be before aggregation, achieving a time of $mathcalO(pM)$.
arXiv Detail & Related papers (2024-08-18T16:50:39Z) - Solving a Class of Non-Convex Minimax Optimization in Federated Learning [84.98927714326908]
The minimax problems arise throughout machine learning applications, ranging from machine learning training to large-scale learning.
We propose a class of algorithms for non minimax problems (emphi) that reduce complexity to $varepsilon-6)$.
We prove that FedSGDA-M has the best sample complexity of $O(kappa2-3)$ and the best-known communication of $O(kappa2-3)$.
arXiv Detail & Related papers (2023-10-05T15:48:41Z) - Communication-Efficient Federated Bilevel Optimization with Local and
Global Lower Level Problems [118.00379425831566]
We propose a communication-efficient algorithm, named FedBiOAcc.
We prove that FedBiOAcc-Local converges at the same rate for this type of problems.
Empirical results show superior performance of our algorithms.
arXiv Detail & Related papers (2023-02-13T21:28:53Z) - FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted
Dual Averaging [104.41634756395545]
Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data.
We propose textbfFedDA, a novel framework for local adaptive gradient methods.
We show that textbfFedDA-MVR is the first adaptive FL algorithm that achieves this rate.
arXiv Detail & Related papers (2023-02-13T05:10:30Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - INTERACT: Achieving Low Sample and Communication Complexities in
Decentralized Bilevel Learning over Networks [24.02913189682224]
Decentralized bilevel optimization problems have received increasing attention in the networking machine learning communities.
Low sample and communication complexities are two fundamental challenges that remain under-explored.
Our work is the first that both low sample and communication complexities for solving decentralized bilevel optimization problems over networks.
arXiv Detail & Related papers (2022-07-27T04:19:28Z) - Federated Minimax Optimization: Improved Convergence Analyses and
Algorithms [32.062312674333775]
We consider non minimax optimization, is gaining prominence many modern machine learning applications such as GANs.
We provide a novel and tighter analysis algorithm, improves convergence communication guarantees in the existing literature.
arXiv Detail & Related papers (2022-03-09T16:21:31Z) - DASHA: Distributed Nonconvex Optimization with Communication
Compression, Optimal Oracle Complexity, and No Client Synchronization [77.34726150561087]
We develop and analyze DASHA: a new family of methods for noneps distributed optimization problems.
Unlike MARINA, the new methods DASHA, DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for learning.
arXiv Detail & Related papers (2022-02-02T20:10:40Z) - STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal
Sample and Communication Complexities for Federated Learning [58.6792963686231]
Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data.
It is not clear how to choose the WNs' minimum update directions, the first minibatch sizes, and the local update frequency.
We show that there is a trade-off curve between local update frequencies and local mini sizes, on which the above complexities can be maintained.
arXiv Detail & Related papers (2021-06-19T06:13:45Z) - Multi-Agent Off-Policy TD Learning: Finite-Time Analysis with
Near-Optimal Sample Complexity and Communication Complexity [13.100926925535578]
We develop two decentralized TD with correction (TDC) algorithms for multi-agent off-policy TD learning.
Our algorithms preserve full privacy of the actions, policies and rewards of the agents, and adopt mini-batch sampling to reduce the sampling variance and communication frequency.
arXiv Detail & Related papers (2021-03-24T12:48:08Z) - O(1) Communication for Distributed SGD through Two-Level Gradient
Averaging [0.0]
We introduce a strategy called two-level gradient averaging (A2SGD) to consolidate all gradients down to merely two local averages per worker.
Our theoretical analysis shows that A2SGD converges similarly like the default distributed SGD algorithm.
arXiv Detail & Related papers (2020-06-12T18:20:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.