Communication Acceleration of Local Gradient Methods via an Accelerated
Primal-Dual Algorithm with Inexact Prox
- URL: http://arxiv.org/abs/2207.03957v1
- Date: Fri, 8 Jul 2022 15:24:13 GMT
- Title: Communication Acceleration of Local Gradient Methods via an Accelerated
Primal-Dual Algorithm with Inexact Prox
- Authors: Abdurakhmon Sadiev and Dmitry Kovalev and Peter Richt\'arik
- Abstract summary: We propose an alternative algorithm which obtains the same communication acceleration as Mishchenko et al (2022)
Our approach is based on the celebrated method of Chambolle and Pock (2011), with several nontrivial modifications.
Our method can be applied to optimization over a connected network, and we obtain theoretical improvements here as well.
- Score: 11.564643329398438
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inspired by a recent breakthrough of Mishchenko et al (2022), who for the
first time showed that local gradient steps can lead to provable communication
acceleration, we propose an alternative algorithm which obtains the same
communication acceleration as their method (ProxSkip). Our approach is very
different, however: it is based on the celebrated method of Chambolle and Pock
(2011), with several nontrivial modifications: i) we allow for an inexact
computation of the prox operator of a certain smooth strongly convex function
via a suitable gradient-based method (e.g., GD, Fast GD or FSFOM), ii) we
perform a careful modification of the dual update step in order to retain
linear convergence. Our general results offer the new state-of-the-art rates
for the class of strongly convex-concave saddle-point problems with bilinear
coupling characterized by the absence of smoothness in the dual function. When
applied to federated learning, we obtain a theoretically better alternative to
ProxSkip: our method requires fewer local steps ($O(\kappa^{1/3})$ or
$O(\kappa^{1/4})$, compared to $O(\kappa^{1/2})$ of ProxSkip), and performs a
deterministic number of local steps instead. Like ProxSkip, our method can be
applied to optimization over a connected network, and we obtain theoretical
improvements here as well.
Related papers
- Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Adaptive Proximal Gradient Method for Convex Optimization [18.681222155879656]
We explore two fundamental first-order algorithms in convex optimization, namely gradient descent (GD) and proximal gradient method (ProxGD)
Our focus is on making these algorithms entirely adaptive by leveraging local curvature information of smooth functions.
We propose adaptive versions of GD and ProxGD that are based on observed gradient differences and, thus, have no added computational costs.
arXiv Detail & Related papers (2023-08-04T11:37:08Z) - FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted
Dual Averaging [104.41634756395545]
Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data.
We propose textbfFedDA, a novel framework for local adaptive gradient methods.
We show that textbfFedDA-MVR is the first adaptive FL algorithm that achieves this rate.
arXiv Detail & Related papers (2023-02-13T05:10:30Z) - Two Losses Are Better Than One: Faster Optimization Using a Cheaper
Proxy [6.170898159041277]
We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy.
Our algorithm guarantees convergence at a rate matching the gradient descent on a $delta$-smooth objective.
Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.
arXiv Detail & Related papers (2023-02-07T15:50:49Z) - Min-Max Optimization Made Simple: Approximating the Proximal Point
Method via Contraction Maps [77.8999425439444]
We present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems.
Our work is based on the fact that the update rule of the Proximal Point method can be approximated up to accuracy.
arXiv Detail & Related papers (2023-01-10T12:18:47Z) - GradSkip: Communication-Accelerated Local Gradient Methods with Better
Computational Complexity [3.222802562733787]
We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication.
We prove that our modified method, for which the name GradSkip, converges linearly under the same assumptions.
arXiv Detail & Related papers (2022-10-28T20:59:06Z) - Momentum-inspired Low-Rank Coordinate Descent for Diagonally Constrained
SDPs [12.7944665592057]
We present a novel, practical, and provable approach for solving constrained semidefinite programming (SDP) problems at scale using accelerated non-trivial programming.
arXiv Detail & Related papers (2021-06-16T13:35:40Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Acceleration Methods [57.202881673406324]
We first use quadratic optimization problems to introduce two key families of acceleration methods.
We discuss momentum methods in detail, starting with the seminal work of Nesterov.
We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates.
arXiv Detail & Related papers (2021-01-23T17:58:25Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.