Accelerating Decentralized Optimization via Overlapping Local Steps
- URL: http://arxiv.org/abs/2601.01493v1
- Date: Sun, 04 Jan 2026 11:40:22 GMT
- Title: Accelerating Decentralized Optimization via Overlapping Local Steps
- Authors: Yijie Zhou, Shi Pu,
- Abstract summary: We propose a novel approach to accelerate decentralized computation without sacrificing theoretical guarantees.<n>We show OLDSGD retains the same iteration as Local Decentralized SGDOLDD while improving per-iteration convergence under different levels of communication delays.
- Score: 6.713278402701195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized optimization has emerged as a critical paradigm for distributed learning, enabling scalable training while preserving data privacy through peer-to-peer collaboration. However, existing methods often suffer from communication bottlenecks due to frequent synchronization between nodes. We present Overlapping Local Decentralized SGD (OLDSGD), a novel approach to accelerate decentralized training by computation-communication overlapping, significantly reducing network idle time. With a deliberately designed update, OLDSGD preserves the same average update as Local SGD while avoiding communication-induced stalls. Theoretically, we establish non-asymptotic convergence rates for smooth non-convex objectives, showing that OLDSGD retains the same iteration complexity as standard Local Decentralized SGD while improving per-iteration runtime. Empirical results demonstrate OLDSGD's consistent improvements in wall-clock time convergence under different levels of communication delays. With minimal modifications to existing frameworks, OLDSGD offers a practical solution for faster decentralized learning without sacrificing theoretical guarantees.
Related papers
- Asynchronous Decentralized SGD under Non-Convexity: A Block-Coordinate Descent Framework [5.001721812806788]
This paper introduces a refined model of Asynchronous Decentralized Gradient Descent (ADSGD) under practical assumptions.<n>With its simplicity, efficiency in memory and communication, and resilience to communication and computation delays, ADSGD is well-suited for real-world decentralized learning tasks.
arXiv Detail & Related papers (2025-05-15T14:06:38Z) - Towards Communication-efficient Federated Learning via Sparse and Aligned Adaptive Optimization [90.08459757321405]
Federated Adam (FedAdam) algorithms suffer from a threefold increase in uplink communication overhead.<n>We propose a novel sparse FedAdam algorithm called FedAdam-SSM, wherein distributed devices sparsify the updates local model parameters and moment estimates.<n>By minimizing the divergence bound between the model trained by FedAdam-SSM and centralized Adam, we optimize the SSM to mitigate the learning performance degradation caused by sparsification error.
arXiv Detail & Related papers (2024-05-28T07:56:49Z) - Adjacent Leader Decentralized Stochastic Gradient Descent [9.851963228675876]
This work focuses on the decentralized deep learning optimization framework.
We propose Adjacent Leader Decentralized Gradient Descent (AL-DSGD)
Experiments demonstrate that AL-DSGD accelerates the convergence of the decentralized state-of-the-art techniques.
arXiv Detail & Related papers (2024-05-18T20:24:11Z) - Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity [85.92481138826949]
We develop a new method-Shadowheart SGD-that provably improves the time complexities of all previous centralized methods.
We also consider the bidirectional setup, where broadcasting from the server to the workers is non-negligible, and develop a corresponding method.
arXiv Detail & Related papers (2024-02-07T12:15:56Z) - Decentralized SGD and Average-direction SAM are Asymptotically
Equivalent [101.37242096601315]
Decentralized gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server.
Existing theories claim that decentralization invariably generalization.
arXiv Detail & Related papers (2023-06-05T14:19:52Z) - Accelerating Parallel Stochastic Gradient Descent via Non-blocking
Mini-batches [3.736244431175932]
Non-blocking SGD can address the straggler problem in a heterogeneous environment.
Non-blocking SGD takes up to 2x fewer time to reach the same training loss in a heterogeneous environment.
arXiv Detail & Related papers (2022-11-02T05:25:01Z) - Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep
Learning [0.0]
Local Asynchronous SGD (LASGD) is an asynchronous decentralized algorithm that relies on All Reduce for model synchronization.
We empirically validate LASGD's performance on image classification tasks on the ImageNet dataset.
arXiv Detail & Related papers (2022-03-24T14:25:15Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - A(DP)$^2$SGD: Asynchronous Decentralized Parallel Stochastic Gradient
Descent with Differential Privacy [15.038697541988746]
A popular distributed learning strategy is federated learning, where there is a central server storing the global model and a set of local computing nodes updating the model parameters with their corresponding data.
In this paper, we present a differentially private version of asynchronous decentralized parallel SGD framework, or A(DP)$2$SGD for short, which maintains communication efficiency of ADPSGD and prevents the inference from malicious participants.
arXiv Detail & Related papers (2020-08-21T00:56:22Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.