Asynchronous Decentralized SGD under Non-Convexity: A Block-Coordinate Descent Framework
- URL: http://arxiv.org/abs/2505.10322v1
- Date: Thu, 15 May 2025 14:06:38 GMT
- Title: Asynchronous Decentralized SGD under Non-Convexity: A Block-Coordinate Descent Framework
- Authors: Yijie Zhou, Shi Pu,
- Abstract summary: This paper introduces a refined model of Asynchronous Decentralized Gradient Descent (ADSGD) under practical assumptions.<n>With its simplicity, efficiency in memory and communication, and resilience to communication and computation delays, ADSGD is well-suited for real-world decentralized learning tasks.
- Score: 5.001721812806788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized optimization has become vital for leveraging distributed data without central control, enhancing scalability and privacy. However, practical deployments face fundamental challenges due to heterogeneous computation speeds and unpredictable communication delays. This paper introduces a refined model of Asynchronous Decentralized Stochastic Gradient Descent (ADSGD) under practical assumptions of bounded computation and communication times. To understand the convergence of ADSGD, we first analyze Asynchronous Stochastic Block Coordinate Descent (ASBCD) as a tool, and then show that ADSGD converges under computation-delay-independent step sizes. The convergence result is established without assuming bounded data heterogeneity. Empirical experiments reveal that ADSGD outperforms existing methods in wall-clock convergence time across various scenarios. With its simplicity, efficiency in memory and communication, and resilience to communication and computation delays, ADSGD is well-suited for real-world decentralized learning tasks.
Related papers
- The Impact Analysis of Delays in Asynchronous Federated Learning with Data Heterogeneity for Edge Intelligence [10.54196990763149]
Federated learning (FL) has provided a new methodology for coordinating a group of clients to train a machine learning model collaboratively.<n>This paper examines the impact of unknown causes of delay on training performance in an Asynchronous Federated Learning (AFL) system with data heterogeneity.
arXiv Detail & Related papers (2025-03-06T03:10:49Z) - Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA.
We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics.
Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z) - Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Decentralized Smoothing ADMM for Quantile Regression with Non-Convex Sparse Penalties [3.269165283595478]
In the rapidly evolving internet-of-things (IoT) ecosystem, effective data analysis techniques are crucial for handling distributed data generated by sensors.
Addressing the limitations of existing methods, such as the sub-gradient consensus approach, which fails to distinguish between active and non-active coefficients.
arXiv Detail & Related papers (2024-08-02T15:00:04Z) - Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework [56.82432591933544]
Distributed gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and helping protect user privacy in machine learning.<n>This paper presents the run time and staleness of distributed SGD based on delay differential equations (SDDEs) and the approximation of gradient arrivals.<n>It is interestingly shown that increasing the number of activated workers does not necessarily accelerate distributed SGD due to staleness.
arXiv Detail & Related papers (2024-06-17T02:56:55Z) - Towards Communication-efficient Federated Learning via Sparse and Aligned Adaptive Optimization [65.85963235502322]
Federated Adam (FedAdam) algorithms suffer from a threefold increase in uplink communication overhead.
We propose a novel sparse FedAdam algorithm called FedAdam-SSM, wherein distributed devices sparsify the updates local model parameters and moment estimates.
By minimizing the divergence bound between the model trained by FedAdam-SSM and centralized Adam, we optimize the SSM to mitigate the learning performance degradation caused by sparsification error.
arXiv Detail & Related papers (2024-05-28T07:56:49Z) - Convergence of Decentralized Stochastic Subgradient-based Methods for Nonsmooth Nonconvex functions [10.278310909980576]
We propose a general framework that unifies various decentralized subgradient-based methods.<n>We prove convergence guarantees for some well-recognized decentralized subgradient-based methods.
arXiv Detail & Related papers (2024-03-18T08:35:17Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - Distributed Semi-supervised Fuzzy Regression with Interpolation
Consistency Regularization [38.16335448831723]
We propose a distributed semi-supervised fuzzy regression (DSFR) model with fuzzy if-then rules and consistency regularization (ICR)
Experiments results on both artificial and real-world datasets show that the proposed DSFR model can achieve much better performance than the state-of-the-art DSSL algorithm.
arXiv Detail & Related papers (2022-09-18T04:46:51Z) - Decentralized Optimization with Heterogeneous Delays: a Continuous-Time
Approach [6.187780920448871]
We propose a novel continuous-time framework to analyze asynchronous algorithms.
We describe a fully asynchronous decentralized algorithm to minimize the sum of smooth and strongly convex functions.
arXiv Detail & Related papers (2021-06-07T13:09:25Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.