Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps
- URL: http://arxiv.org/abs/2407.02689v2
- Date: Sun, 11 Aug 2024 05:10:43 GMT
- Title: Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps
- Authors: Junchi Yang, Murat Yildirim, Qiu Feng,
- Abstract summary: In distributed machine learning, linear variables across multiple agents with different data poses significant challenges.
In this paper we show that a framework that achieves the Lagrangian convergence on the primal variable requires no inter-agent communication.
- Score: 4.471962177124311
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In distributed machine learning, efficient training across multiple agents with different data distributions poses significant challenges. Even with a centralized coordinator, current algorithms that achieve optimal communication complexity typically require either large minibatches or compromise on gradient complexity. In this work, we tackle both centralized and decentralized settings across strongly convex, convex, and nonconvex objectives. We first demonstrate that a basic primal-dual method, (Accelerated) Gradient Ascent Multiple Stochastic Gradient Descent (GA-MSGD), applied to the Lagrangian of distributed optimization inherently incorporates local updates, because the inner loops of running Stochastic Gradient Descent on the primal variable require no inter-agent communication. Notably, for strongly convex objectives, (Accelerated) GA-MSGD achieves linear convergence in communication rounds despite the Lagrangian being only linear in the dual variables. This is due to a structural property where the dual variable is confined to the span of the coupling matrix, rendering the dual problem strongly concave. When integrated with the Catalyst framework, our approach achieves nearly optimal communication complexity across various settings without the need for minibatches.
Related papers
- A Communication and Computation Efficient Fully First-order Method for Decentralized Bilevel Optimization [16.020878731214083]
This paper introduces a fully first-order decentralized method for decentralized Bilevel optimization, $textC2$DFB.
$textC2$DFB is both compute- and communicate-efficient.
arXiv Detail & Related papers (2024-10-18T02:00:45Z) - Near-Optimal Decentralized Momentum Method for Nonconvex-PL Minimax
Problems [39.197569803430646]
Minimax optimization plays an important role in many machine learning tasks such as adversarial networks (GANs) and adversarial training.
Although recently a wide variety of optimization methods have been proposed to solve the minimax problems, most of them ignore the distributed setting.
arXiv Detail & Related papers (2023-04-21T11:38:41Z) - DIAMOND: Taming Sample and Communication Complexities in Decentralized
Bilevel Optimization [27.317118892531827]
We develop a new decentralized bilevel optimization called DIAMOND (decentralized single-timescale approximation with momentum and gradient-tracking)
We show that the DIAMOND enjoys $mathcalO(epsilon-3/2)$ in sample and communication complexities for achieving an $epsilon$-stationary solution.
arXiv Detail & Related papers (2022-12-05T15:58:00Z) - A Variance-Reduced Stochastic Gradient Tracking Algorithm for
Decentralized Optimization with Orthogonality Constraints [7.028225540638832]
We propose a novel algorithm for decentralized optimization with orthogonality constraints.
VRSGT is the first algorithm for decentralized optimization with orthogonality constraints that reduces both sampling and communication complexities simultaneously.
In the numerical experiments, VRGTS has a promising performance in a real-world autonomous sample.
arXiv Detail & Related papers (2022-08-29T14:46:44Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Minimax
Problems [80.46370778277186]
Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks.
We develop a communication-efficient distributed extragrad algorithm, LocalAdaSient, with an adaptive learning rate suitable for solving convex-concave minimax problem in the.
Server model.
We demonstrate its efficacy through several experiments in both the homogeneous and heterogeneous settings.
arXiv Detail & Related papers (2021-06-18T09:42:05Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.