Related papers: SAGDA: Achieving $\mathcal{O}(\epsilon^{-2})$ Communication Complexity in Federated Min-Max Learning

SAGDA: Achieving $\mathcal{O}(\epsilon^{-2})$ Communication Complexity in Federated Min-Max Learning

URL: http://arxiv.org/abs/2210.00611v1
Date: Sun, 2 Oct 2022 20:04:50 GMT
Title: SAGDA: Achieving $\mathcal{O}(\epsilon^{-2})$ Communication Complexity in Federated Min-Max Learning
Authors: Haibo Yang, Zhuqing Liu, Xin Zhang, Jia Liu
Abstract summary: We show that SAGDA achieves a linear speedup in terms of both the number of clients and local update steps. We also advance the current understanding on communication complexity of the standard FSGDA method for federated min-max learning.
Score: 9.001405567602745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To lower the communication complexity of federated min-max learning, a natural approach is to utilize the idea of infrequent communications (through multiple local updates) same as in conventional federated learning. However, due to the more complicated inter-outer problem structure in federated min-max learning, theoretical understandings of communication complexity for federated min-max learning with infrequent communications remain very limited in the literature. This is particularly true for settings with non-i.i.d. datasets and partial client participation. To address this challenge, in this paper, we propose a new algorithmic framework called stochastic sampling averaging gradient descent ascent (SAGDA), which i) assembles stochastic gradient estimators from randomly sampled clients as control variates and ii) leverages two learning rates on both server and client sides. We show that SAGDA achieves a linear speedup in terms of both the number of clients and local update steps, which yields an $\mathcal{O}(\epsilon^{-2})$ communication complexity that is orders of magnitude lower than the state of the art. Interestingly, by noting that the standard federated stochastic gradient descent ascent (FSGDA) is in fact a control-variate-free special version of SAGDA, we immediately arrive at an $\mathcal{O}(\epsilon^{-2})$ communication complexity result for FSGDA. Therefore, through the lens of SAGDA, we also advance the current understanding on communication complexity of the standard FSGDA method for federated min-max learning.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Non-Convex Optimization in Federated Learning via Variance Reduction and Adaptive Learning [13.83895180419626]
This paper proposes a novel algorithm that leverages momentum-based variance reduction with adaptive learning to address non-epsilon settings across heterogeneous data. We aim to overcome challenges related to variance, hinders efficiency, and the slow convergence from learning rate adjustments with heterogeneous data.
arXiv Detail & Related papers (2024-12-16T11:02:38Z)
Byzantine-resilient Federated Learning Employing Normalized Gradients on Non-IID Datasets [23.640506243685863]
In practical federated learning (FLNGA) the presence of malicious attacks and data heterogeneity often introduces biases into the learning process. We propose a Normalized Gradient unit (Fed-M) model which normalizes uploaded local gradients to be before aggregation, achieving a time of $mathcalO(pM)$.
arXiv Detail & Related papers (2024-08-18T16:50:39Z)
Solving a Class of Non-Convex Minimax Optimization in Federated Learning [84.98927714326908]
The minimax problems arise throughout machine learning applications, ranging from machine learning training to large-scale learning. We propose a class of algorithms for non minimax problems (emphi) that reduce complexity to $varepsilon-6)$. We prove that FedSGDA-M has the best sample complexity of $O(kappa2-3)$ and the best-known communication of $O(kappa2-3)$.
arXiv Detail & Related papers (2023-10-05T15:48:41Z)
Communication-Efficient Federated Bilevel Optimization with Local and Global Lower Level Problems [118.00379425831566]
We propose a communication-efficient algorithm, named FedBiOAcc. We prove that FedBiOAcc-Local converges at the same rate for this type of problems. Empirical results show superior performance of our algorithms.
arXiv Detail & Related papers (2023-02-13T21:28:53Z)
FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted Dual Averaging [104.41634756395545]
Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data. We propose textbfFedDA, a novel framework for local adaptive gradient methods. We show that textbfFedDA-MVR is the first adaptive FL algorithm that achieves this rate.
arXiv Detail & Related papers (2023-02-13T05:10:30Z)
Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems. It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z)
INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks [24.02913189682224]
Decentralized bilevel optimization problems have received increasing attention in the networking machine learning communities. Low sample and communication complexities are two fundamental challenges that remain under-explored. Our work is the first that both low sample and communication complexities for solving decentralized bilevel optimization problems over networks.
arXiv Detail & Related papers (2022-07-27T04:19:28Z)
Federated Minimax Optimization: Improved Convergence Analyses and Algorithms [32.062312674333775]
We consider non minimax optimization, is gaining prominence many modern machine learning applications such as GANs. We provide a novel and tighter analysis algorithm, improves convergence communication guarantees in the existing literature.
arXiv Detail & Related papers (2022-03-09T16:21:31Z)
DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization [77.34726150561087]
We develop and analyze DASHA: a new family of methods for noneps distributed optimization problems. Unlike MARINA, the new methods DASHA, DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for learning.
arXiv Detail & Related papers (2022-02-02T20:10:40Z)
STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning [58.6792963686231]
Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. It is not clear how to choose the WNs' minimum update directions, the first minibatch sizes, and the local update frequency. We show that there is a trade-off curve between local update frequencies and local mini sizes, on which the above complexities can be maintained.
arXiv Detail & Related papers (2021-06-19T06:13:45Z)
Multi-Agent Off-Policy TD Learning: Finite-Time Analysis with Near-Optimal Sample Complexity and Communication Complexity [13.100926925535578]
We develop two decentralized TD with correction (TDC) algorithms for multi-agent off-policy TD learning. Our algorithms preserve full privacy of the actions, policies and rewards of the agents, and adopt mini-batch sampling to reduce the sampling variance and communication frequency.
arXiv Detail & Related papers (2021-03-24T12:48:08Z)
O(1) Communication for Distributed SGD through Two-Level Gradient Averaging [0.0]
We introduce a strategy called two-level gradient averaging (A2SGD) to consolidate all gradients down to merely two local averages per worker. Our theoretical analysis shows that A2SGD converges similarly like the default distributed SGD algorithm.
arXiv Detail & Related papers (2020-06-12T18:20:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.