Asynchronous Heavy-Tailed Optimization
- URL: http://arxiv.org/abs/2602.18002v1
- Date: Fri, 20 Feb 2026 05:28:48 GMT
- Title: Asynchronous Heavy-Tailed Optimization
- Authors: Junfei Sun, Dixi Yao, Xuchen Gong, Tahseen Rabbani, Manzil Zaheer, Tian Li,
- Abstract summary: We propose two communication schemes that handle stragglers with asynchronous updates in the presence of heavy-tailed gradient noise.<n>Our convergence guarantees under heavy-tailed noise match the rate of the synchronous counterparts and improve delay tolerance compared with existing asynchronous approaches.
- Score: 23.148100548141983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Heavy-tailed stochastic gradient noise, commonly observed in transformer models, can destabilize the optimization process. Recent works mainly focus on developing and understanding approaches to address heavy-tailed noise in the centralized or distributed, synchronous setting, leaving the interactions between such noise and asynchronous optimization underexplored. In this work, we investigate two communication schemes that handle stragglers with asynchronous updates in the presence of heavy-tailed gradient noise. We propose and theoretically analyze algorithmic modifications based on delay-aware learning rate scheduling and delay compensation to enhance the performance of asynchronous algorithms. Our convergence guarantees under heavy-tailed noise match the rate of the synchronous counterparts and improve delay tolerance compared with existing asynchronous approaches. Empirically, our approaches outperform prior synchronous and asynchronous methods in terms of accuracy/runtime trade-offs and are more robust to hyperparameters in both image and language tasks.
Related papers
- Do We Need Asynchronous SGD? On the Near-Optimality of Synchronous Methods [59.72933231179977]
We revisit Synchronous SGD and its robust variant, called $m$-Synchronous SGD, and theoretically show that they are nearly optimal in many heterogeneous computation scenarios.<n>While synchronous methods are not universal solutions and there exist tasks where asynchronous methods may be necessary, we show that they are sufficient for many modern heterogeneous computation scenarios.
arXiv Detail & Related papers (2026-02-03T18:02:14Z) - AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline Parallelism [54.8494905524997]
We introduce asynchronous updates across both parallelism axes, relaxing the co-location requirement.<n>We provide convergence guarantees for both sparse averaging and asynchronous updates.<n>Experiments on large-scale language models demonstrate that our approach matches the performance of the fully synchronous baseline.
arXiv Detail & Related papers (2026-01-30T01:24:47Z) - First Provably Optimal Asynchronous SGD for Homogeneous and Heterogeneous Data [0.0]
dissertation develops a rigorous framework for asynchronous order optimization.<n>We show that with proper design, asynchronous SGD can achieve optimal time complexity, matching guarantees previously known only for synchronous methods.
arXiv Detail & Related papers (2026-01-05T19:51:09Z) - Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity [51.56484100374058]
We introduce Ringleader ASGD, the first asynchronous algorithm that attains the theoretical lower bounds for parallel computation.<n>Our analysis further establishes that Ringleader ASGD remains optimal under arbitrary gradient and even time-varying speeds.
arXiv Detail & Related papers (2025-09-26T19:19:15Z) - Nesterov Method for Asynchronous Pipeline Parallel Optimization [59.79227116582264]
We introduce a variant of Nesterov Accelerated Gradient (NAG) for asynchronous optimization in Pipeline Parallelism.<n>Specifically, we modify the look-ahead step in NAG to effectively address the staleness in gradients.<n>We theoretically prove that our approach converges at a sublinear rate in the presence of fixed delay in gradients.
arXiv Detail & Related papers (2025-05-02T08:23:29Z) - Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling [81.37449968164692]
We propose Synchronized Coupled Sampling (SynCoS), a novel inference framework that synchronizes denoising paths across the entire video.<n>Our approach combines two complementary sampling strategies, which ensure seamless local transitions and enforce global coherence.<n>Extensive experiments show that SynCoS significantly improves multi-event long video generation, achieving smoother transitions and superior long-range coherence.
arXiv Detail & Related papers (2025-03-11T16:43:45Z) - DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Asynchronous Distributed Optimization with Delay-free Parameters [9.062164411594175]
This paper develops asynchronous versions of two distributed algorithms, Prox-DGD and DGD-ATC, for solving consensus optimization problems over undirected networks.
In contrast to alternatives, our algorithms can converge to the fixed point set of their synchronous counterparts using step-sizes that are independent of the delays.
arXiv Detail & Related papers (2023-12-11T16:33:38Z) - Asynchronous Iterations in Optimization: New Sequence Results and
Sharper Algorithmic Guarantees [10.984101749941471]
We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms.
Results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates.
arXiv Detail & Related papers (2021-09-09T19:08:56Z) - Decentralized Optimization with Heterogeneous Delays: a Continuous-Time
Approach [6.187780920448871]
We propose a novel continuous-time framework to analyze asynchronous algorithms.
We describe a fully asynchronous decentralized algorithm to minimize the sum of smooth and strongly convex functions.
arXiv Detail & Related papers (2021-06-07T13:09:25Z) - Advances in Asynchronous Parallel and Distributed Optimization [11.438194383787604]
Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables.
They are not sensitive to issues like stragglers (i.e., slow nodes) and unreliable communication links.
This article reviews recent developments in the design and analysis of asynchronous optimization methods.
arXiv Detail & Related papers (2020-06-24T16:10:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.