The Role of Local Steps in Local SGD
- URL: http://arxiv.org/abs/2203.06798v1
- Date: Mon, 14 Mar 2022 00:54:43 GMT
- Title: The Role of Local Steps in Local SGD
- Authors: Tiancheng Qin, S. Rasoul Etesami, C\'esar A. Uribe
- Abstract summary: We consider the distributed optimization problem where $n$ agents want a global function given by the sum of their local functions.
We study the effect steps on the convergence state of the communication complexity of Local SGD.
- Score: 2.3572498744567123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the distributed stochastic optimization problem where $n$ agents
want to minimize a global function given by the sum of agents' local functions.
We focus on the heterogeneous setting when agents' local functions are defined
over non-i.i.d. data sets and assume that agents have access to the global
function through noisy gradients of their local functions. We study the Local
SGD method, where agents perform a number of local stochastic gradient steps
and occasionally communicate with a central node to improve their local
optimization tasks. We analyze the effect of local steps on the convergence
rate and the communication complexity of Local SGD. Existing literature assumes
a fixed number of local steps across all communication rounds. However, we
allow the number of local steps during the $i$-th communication round, denoted
by $H_i$, to be arbitrary. Our main contribution is to characterize the
convergence rate of Local SGD as a function of $\{H_i\}_{i=1}^R$ under various
settings of strongly convex, convex, and nonconvex local functions, where $R$
is the total number of communication rounds. Based on this characterization, we
provide sufficient conditions on the sequence $\{H_i\}_{i=1}^R$ such that Local
SGD can achieve linear speed-up with respect to the number of workers.
Furthermore, we propose a new communication strategy with increasing local
steps superior to existing communication strategies for strongly convex local
functions. On the other hand, for convex and nonconvex local functions, we
argue that fixed local steps are the best communication strategy for Local SGD
and recover state-of-the-art convergence rate results. Finally, we justify our
theoretical results through extensive numerical experiments.
Related papers
- Communication-Efficient Federated Group Distributionally Robust Optimization [49.14751984342068]
Federated learning faces challenges due to the heterogeneity in data volumes and distributions at different clients.
Existing approaches to address this issue based on group distributionally robust optimization (GDRO) often lead to high communication and sample complexity.
This work introduces algorithms tailored for communication-efficient Federated Group Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-10-08T21:07:53Z) - FedSpeed: Larger Local Interval, Less Communication Round, and Higher
Generalization Accuracy [84.45004766136663]
Federated learning is an emerging distributed machine learning framework.
It suffers from the non-vanishing biases introduced by the local inconsistent optimal and the rugged client-drifts by the local over-fitting.
We propose a novel and practical method, FedSpeed, to alleviate the negative impacts posed by these problems.
arXiv Detail & Related papers (2023-02-21T03:55:29Z) - On the Performance of Gradient Tracking with Local Updates [10.14746251086461]
We show that LU-GT has the same communication quality but allows arbitrary network topologies.
Numerical examples reveal that local updates can lower communication costs in certain regimes.
arXiv Detail & Related papers (2022-10-10T15:13:23Z) - A Communication-efficient Algorithm with Linear Convergence for
Federated Minimax Learning [1.713291434132985]
We study a large-scale multi-agent minimax optimization problem, which models Geneimation Adversarial Networks (GANs)
The overall objective is a sum of agents' private local objective functions.
We show that FedGDA-GT converges linearly with a constant stepsize to global $epsilon GDA solution.
arXiv Detail & Related papers (2022-06-02T16:31:16Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - An Entropy-guided Reinforced Partial Convolutional Network for Zero-Shot
Learning [77.72330187258498]
We propose a novel Entropy-guided Reinforced Partial Convolutional Network (ERPCNet)
ERPCNet extracts and aggregates localities based on semantic relevance and visual correlations without human-annotated regions.
It not only discovers global-cooperative localities dynamically but also converges faster for policy gradient optimization.
arXiv Detail & Related papers (2021-11-03T11:13:13Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Local SGD: Unified Theory and New Efficient Methods [8.701566919381223]
We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes.
We develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.
arXiv Detail & Related papers (2020-11-03T13:02:50Z) - From Local SGD to Local Fixed-Point Methods for Federated Learning [17.04886864943171]
We consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting.
We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations.
arXiv Detail & Related papers (2020-04-03T09:24:10Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.