A Communication-efficient Algorithm with Linear Convergence for
Federated Minimax Learning
- URL: http://arxiv.org/abs/2206.01132v2
- Date: Tue, 6 Jun 2023 16:17:23 GMT
- Title: A Communication-efficient Algorithm with Linear Convergence for
Federated Minimax Learning
- Authors: Zhenyu Sun, Ermin Wei
- Abstract summary: We study a large-scale multi-agent minimax optimization problem, which models Geneimation Adversarial Networks (GANs)
The overall objective is a sum of agents' private local objective functions.
We show that FedGDA-GT converges linearly with a constant stepsize to global $epsilon GDA solution.
- Score: 1.713291434132985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study a large-scale multi-agent minimax optimization
problem, which models many interesting applications in statistical learning and
game theory, including Generative Adversarial Networks (GANs). The overall
objective is a sum of agents' private local objective functions. We first
analyze an important special case, empirical minimax problem, where the overall
objective approximates a true population minimax risk by statistical samples.
We provide generalization bounds for learning with this objective through
Rademacher complexity analysis. Then, we focus on the federated setting, where
agents can perform local computation and communicate with a central server.
Most existing federated minimax algorithms either require communication per
iteration or lack performance guarantees with the exception of Local Stochastic
Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent
algorithm which guarantees convergence under a diminishing stepsize. By
analyzing Local SGDA under the ideal condition of no gradient noise, we show
that generally it cannot guarantee exact convergence with constant stepsizes
and thus suffers from slow rates of convergence. To tackle this issue, we
propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA)
method based on Gradient Tracking (GT). When local objectives are Lipschitz
smooth and strongly-convex-strongly-concave, we prove that FedGDA-GT converges
linearly with a constant stepsize to global $\epsilon$-approximation solution
with $\mathcal{O}(\log (1/\epsilon))$ rounds of communication, which matches
the time complexity of centralized GDA method. Finally, we numerically show
that FedGDA-GT outperforms Local SGDA.
Related papers
- Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA.
We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics.
Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z) - Decentralized Riemannian Algorithm for Nonconvex Minimax Problems [82.50374560598493]
The minimax algorithms for neural networks have been developed to solve many problems.
In this paper, we propose two types of minimax algorithms.
For the setting, we propose DRSGDA and prove that our method achieves a gradient.
arXiv Detail & Related papers (2023-02-08T01:42:45Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax
optimization [18.668531108219415]
We present a theoretical approach for solving minimax optimization problems using sequentially descent-ascent gradient (SGDA)
We analyze both simultaneous and alternating SGDA-LL objectives for non-concave objectives with Polyak-Lojasiewicz (PL) geometry.
Our rates also extend to mini-batch-GDARR, recovering few known rates for full gradient-batch descent-ascent gradient (GDA)
Finally, we present a comprehensive lower bound for two-time-scale GDA, which matches the full rate for primal-PL-
arXiv Detail & Related papers (2022-10-12T08:05:41Z) - Federated Minimax Optimization: Improved Convergence Analyses and
Algorithms [32.062312674333775]
We consider non minimax optimization, is gaining prominence many modern machine learning applications such as GANs.
We provide a novel and tighter analysis algorithm, improves convergence communication guarantees in the existing literature.
arXiv Detail & Related papers (2022-03-09T16:21:31Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Local Stochastic Gradient Descent Ascent: Convergence Analysis and
Communication Efficiency [15.04034188283642]
Local SGD is a promising approach to overcome the communication overhead in distributed learning.
We show that local SGDA can provably optimize distributed minimax problems in both homogeneous and heterogeneous data.
arXiv Detail & Related papers (2021-02-25T20:15:18Z) - Proximal Gradient Descent-Ascent: Variable Convergence under K{\L}
Geometry [49.65455534654459]
The finite descent-ascent parameters (GDA) has been widely applied to solve minimax optimization problems.
This paper fills such a gap by studying the convergence of the KL-Lized geometry.
arXiv Detail & Related papers (2021-02-09T05:35:53Z) - An improved convergence analysis for decentralized online stochastic
non-convex optimization [17.386715847732468]
In this paper, we show that a technique called GT-Loakjasiewics (GT-Loakjasiewics) satisfies the existing condition GT-Loakjasiewics (GT-Loakjasiewics) satisfies the current best convergence rates.
The results are not only immediately applicable but also the currently known best convergence rates.
arXiv Detail & Related papers (2020-08-10T15:29:13Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z) - Replica Exchange for Non-Convex Optimization [4.421561004829125]
Gradient descent (GD) is known to converge quickly for convex objective functions, but it can be trapped at local minima.
Langevin dynamics (LD) can explore the state space and find global minima, but in order to give accurate estimates, LD needs to run with a small discretization step size and verify weak force.
This paper shows that these two algorithms and their non-swapping variants can collaborate" through a simple exchange mechanism.
arXiv Detail & Related papers (2020-01-23T03:13:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.