SYNTHESIS: A Semi-Asynchronous Path-Integrated Stochastic Gradient
Method for Distributed Learning in Computing Clusters
- URL: http://arxiv.org/abs/2208.08425v1
- Date: Wed, 17 Aug 2022 17:42:33 GMT
- Title: SYNTHESIS: A Semi-Asynchronous Path-Integrated Stochastic Gradient
Method for Distributed Learning in Computing Clusters
- Authors: Zhuqing Liu, Xin Zhang, Jia Liu
- Abstract summary: ulstochastic gradulient ulsearch is developed to overcome the limitations of both synchronous and asynchronous distributed learning algorithms.
algname algorithms have (O(sqrtNepsilon-2(Delta+1) d+N)) and (O(sqrtNepsilon-2(+1) d+N))
(epsilon)-stationary point in non-Delta learning under distributed and shared memory architectures
- Score: 7.968142741470549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To increase the training speed of distributed learning, recent years have
witnessed a significant amount of interest in developing both synchronous and
asynchronous distributed stochastic variance-reduced optimization methods.
However, all existing synchronous and asynchronous distributed training
algorithms suffer from various limitations in either convergence speed or
implementation complexity. This motivates us to propose an algorithm called
\algname (\ul{s}emi-as\ul{yn}chronous pa\ul{th}-int\ul{e}grated \ul{s}tochastic
grad\ul{i}ent \ul{s}earch), which leverages the special structure of the
variance-reduction framework to overcome the limitations of both synchronous
and asynchronous distributed learning algorithms, while retaining their salient
features. We consider two implementations of \algname under distributed and
shared memory architectures. We show that our \algname algorithms have
\(O(\sqrt{N}\epsilon^{-2}(\Delta+1)+N)\) and
\(O(\sqrt{N}\epsilon^{-2}(\Delta+1) d+N)\) computational complexities for
achieving an \(\epsilon\)-stationary point in non-convex learning under
distributed and shared memory architectures, respectively, where \(N\) denotes
the total number of training samples and \(\Delta\) represents the maximum
delay of the workers. Moreover, we investigate the generalization performance
of \algname by establishing algorithmic stability bounds for quadratic strongly
convex and non-convex optimization. We further conduct extensive numerical
experiments to verify our theoretical findings
Related papers
- Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations [92.1840862558718]
In practical distributed systems, workers typically not homogeneous, and can have highly varying processing times.
We introduce a new parallel method Freya to handle arbitrarily slow computations.
We show that Freya offers significantly improved complexity guarantees compared to all previous methods.
arXiv Detail & Related papers (2024-05-24T13:33:30Z) - Asynchronous Distributed Optimization with Delay-free Parameters [9.062164411594175]
This paper develops asynchronous versions of two distributed algorithms, Prox-DGD and DGD-ATC, for solving consensus optimization problems over undirected networks.
In contrast to alternatives, our algorithms can converge to the fixed point set of their synchronous counterparts using step-sizes that are independent of the delays.
arXiv Detail & Related papers (2023-12-11T16:33:38Z) - AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms [45.90015262911875]
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting.
As a by-product of our analysis, we also demonstrate guarantees for gradient-type algorithms such as SGD with random tightness.
arXiv Detail & Related papers (2023-10-31T13:44:53Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - DASHA: Distributed Nonconvex Optimization with Communication
Compression, Optimal Oracle Complexity, and No Client Synchronization [77.34726150561087]
We develop and analyze DASHA: a new family of methods for noneps distributed optimization problems.
Unlike MARINA, the new methods DASHA, DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for learning.
arXiv Detail & Related papers (2022-02-02T20:10:40Z) - Asynchronous Iterations in Optimization: New Sequence Results and
Sharper Algorithmic Guarantees [10.984101749941471]
We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms.
Results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates.
arXiv Detail & Related papers (2021-09-09T19:08:56Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.