Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
- URL: http://arxiv.org/abs/2509.22860v2
- Date: Tue, 30 Sep 2025 15:11:44 GMT
- Title: Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
- Authors: Artavazd Maranjyan, Peter Richtárik,
- Abstract summary: We introduce Ringleader ASGD, the first asynchronous algorithm that attains the theoretical lower bounds for parallel computation.<n>Our analysis further establishes that Ringleader ASGD remains optimal under arbitrary gradient and even time-varying speeds.
- Score: 51.56484100374058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Asynchronous stochastic gradient methods are central to scalable distributed optimization, particularly when devices differ in computational capabilities. Such settings arise naturally in federated learning, where training takes place on smartphones and other heterogeneous edge devices. In addition to varying computation speeds, these devices often hold data from different distributions. However, existing asynchronous SGD methods struggle in such heterogeneous settings and face two key limitations. First, many rely on unrealistic assumptions of similarity across workers' data distributions. Second, methods that relax this assumption still fail to achieve theoretically optimal performance under heterogeneous computation times. We introduce Ringleader ASGD, the first asynchronous SGD algorithm that attains the theoretical lower bounds for parallel first-order stochastic methods in the smooth nonconvex regime, thereby achieving optimal time complexity under data heterogeneity and without restrictive similarity assumptions. Our analysis further establishes that Ringleader ASGD remains optimal under arbitrary and even time-varying worker computation speeds, closing a fundamental gap in the theory of asynchronous optimization.
Related papers
- Do We Need Asynchronous SGD? On the Near-Optimality of Synchronous Methods [59.72933231179977]
We revisit Synchronous SGD and its robust variant, called $m$-Synchronous SGD, and theoretically show that they are nearly optimal in many heterogeneous computation scenarios.<n>While synchronous methods are not universal solutions and there exist tasks where asynchronous methods may be necessary, we show that they are sufficient for many modern heterogeneous computation scenarios.
arXiv Detail & Related papers (2026-02-03T18:02:14Z) - First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data [0.0]
dissertation develops a rigorous framework for asynchronous order optimization.<n>We show that with proper design, asynchronous SGD can achieve optimal time complexity, matching guarantees previously known only for synchronous methods.
arXiv Detail & Related papers (2026-01-05T19:51:09Z) - Adaptive Deadline and Batch Layered Synchronized Federated Learning [66.93447103966439]
Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner.<n>We propose ADEL-FL, a novel framework that jointly optimize per-round deadlines and user-specific batch sizes for layer-wise aggregation.
arXiv Detail & Related papers (2025-05-29T19:59:18Z) - Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity [92.1840862558718]
Ringmaster ASGD achieves optimal time complexity under arbitrarily heterogeneous computation times.<n>This makes it the first Asynchronous SGD method to meet the theoretical lower bounds for time complexity in such scenarios.
arXiv Detail & Related papers (2025-01-27T16:07:26Z) - MindFlayer SGD: Efficient Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times [49.1574468325115]
We investigate the problem of minimizing the expectation of smooth non functions in a setting with multiple parallel workers that are able to compute optimal gradients.<n>A challenge in this context is the presence of arbitrarily heterogeneous and distributed compute times.<n>We introduce MindFlayer SGD, a novel parallel SGD method specifically designed to handle this gap.
arXiv Detail & Related papers (2024-10-05T21:11:32Z) - Dual-Delayed Asynchronous SGD for Arbitrarily Heterogeneous Data [22.917944307972434]
We consider the distributed learning problem with data across multiple workers under the orchestration of a central server.
We introduce the textitdual-delayed asynchronous SGD (DuDe-ASGD) algorithm designed to the adverse effects of data iterations.
DuDe-ASGD makes full use of stale gradients from all workers during asynchronous training, leading to two time lags in the model parameters and data samples utilized in the servers.
arXiv Detail & Related papers (2024-05-27T09:00:30Z) - Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations [92.1840862558718]
In practical distributed systems, workers typically not homogeneous, and can have highly varying processing times.
We introduce a new parallel method Freya to handle arbitrarily slow computations.
We show that Freya offers significantly improved complexity guarantees compared to all previous methods.
arXiv Detail & Related papers (2024-05-24T13:33:30Z) - AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms [45.90015262911875]
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting.
As a by-product of our analysis, we also demonstrate guarantees for gradient-type algorithms such as SGD with random tightness.
arXiv Detail & Related papers (2023-10-31T13:44:53Z) - Robust Fully-Asynchronous Methods for Distributed Training over General Architecture [11.480605289411807]
Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers.
We propose Fully-Asynchronous Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own without any form of impact.
arXiv Detail & Related papers (2023-07-21T14:36:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.