Local Stochastic Gradient Descent Ascent: Convergence Analysis and
Communication Efficiency
- URL: http://arxiv.org/abs/2102.13152v1
- Date: Thu, 25 Feb 2021 20:15:18 GMT
- Title: Local Stochastic Gradient Descent Ascent: Convergence Analysis and
Communication Efficiency
- Authors: Yuyang Deng, Mehrdad Mahdavi
- Abstract summary: Local SGD is a promising approach to overcome the communication overhead in distributed learning.
We show that local SGDA can provably optimize distributed minimax problems in both homogeneous and heterogeneous data.
- Score: 15.04034188283642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Local SGD is a promising approach to overcome the communication overhead in
distributed learning by reducing the synchronization frequency among worker
nodes. Despite the recent theoretical advances of local SGD in empirical risk
minimization, the efficiency of its counterpart in minimax optimization remains
unexplored. Motivated by large scale minimax learning problems, such as
adversarial robust learning and training generative adversarial networks
(GANs), we propose local Stochastic Gradient Descent Ascent (local SGDA), where
the primal and dual variables can be trained locally and averaged periodically
to significantly reduce the number of communications. We show that local SGDA
can provably optimize distributed minimax problems in both homogeneous and
heterogeneous data with reduced number of communications and establish
convergence rates under strongly-convex-strongly-concave and
nonconvex-strongly-concave settings. In addition, we propose a novel variant
local SGDA+, to solve nonconvex-nonconcave problems. We give corroborating
empirical evidence on different distributed minimax problems.
Related papers
- Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA.
We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics.
Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z) - The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication [37.210933391984014]
Local SGD is a popular optimization method in distributed learning, often outperforming other algorithms in practice.
We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions.
We also show the min-max optimality of accelerated mini-batch SGD for several problem classes.
arXiv Detail & Related papers (2024-05-19T20:20:03Z) - Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification
in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks.
We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD.
The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z) - Federated Minimax Optimization: Improved Convergence Analyses and
Algorithms [32.062312674333775]
We consider non minimax optimization, is gaining prominence many modern machine learning applications such as GANs.
We provide a novel and tighter analysis algorithm, improves convergence communication guarantees in the existing literature.
arXiv Detail & Related papers (2022-03-09T16:21:31Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Minimax
Problems [80.46370778277186]
Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks.
We develop a communication-efficient distributed extragrad algorithm, LocalAdaSient, with an adaptive learning rate suitable for solving convex-concave minimax problem in the.
Server model.
We demonstrate its efficacy through several experiments in both the homogeneous and heterogeneous settings.
arXiv Detail & Related papers (2021-06-18T09:42:05Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.