Linear Speedup of Incremental Aggregated Gradient Methods on Streaming
Data
- URL: http://arxiv.org/abs/2309.04980v1
- Date: Sun, 10 Sep 2023 10:08:52 GMT
- Title: Linear Speedup of Incremental Aggregated Gradient Methods on Streaming
Data
- Authors: Xiaolu Wang, Cheng Jin, Hoi-To Wai, Yuantao Gu
- Abstract summary: This paper considers a type of incremental aggregated gradient (IAG) method for large-scale distributed optimization.
We show that the streaming IAG method achieves linear speedup when the workers are updating frequently enough.
- Score: 38.54333970135826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers a type of incremental aggregated gradient (IAG) method
for large-scale distributed optimization. The IAG method is well suited for the
parameter server architecture as the latter can easily aggregate potentially
staled gradients contributed by workers. Although the convergence of IAG in the
case of deterministic gradient is well known, there are only a few results for
the case of its stochastic variant based on streaming data. Considering
strongly convex optimization, this paper shows that the streaming IAG method
achieves linear speedup when the workers are updating frequently enough, even
if the data sample distribution across workers are heterogeneous. We show that
the expected squared distance to optimal solution decays at O((1+T)/(nt)),
where $n$ is the number of workers, t is the iteration number, and T/n is the
update frequency of workers. Our analysis involves careful treatments of the
conditional expectations with staled gradients and a recursive system with both
delayed and noise terms, which are new to the analysis of IAG-type algorithms.
Numerical results are presented to verify our findings.
Related papers
- AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms [45.90015262911875]
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting.
As a by-product of our analysis, we also demonstrate guarantees for gradient-type algorithms such as SGD with random tightness.
arXiv Detail & Related papers (2023-10-31T13:44:53Z) - Sampling from Gaussian Process Posteriors using Stochastic Gradient
Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems.
We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum.
Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Stochastic Reweighted Gradient Descent [4.355567556995855]
We propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient)
We pay particular attention to the time and memory overhead of our proposed method.
We present empirical results to support our findings.
arXiv Detail & Related papers (2021-03-23T04:09:43Z) - Stochastic Gradient Variance Reduction by Solving a Filtering Problem [0.951828574518325]
Deep neural networks (DNN) are typically using optimized gradient descent (SGD)
The estimation of the gradient using samples tends to be noisy and unreliable, resulting in large gradient variance and bad convergence.
We propose textbfFilter Gradient Decent(FGD), an efficient optimization algorithm that makes the consistent estimation of gradient.
arXiv Detail & Related papers (2020-12-22T23:48:42Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Gradient Coding with Dynamic Clustering for Straggler Mitigation [57.9123881133818]
GC-DC regulates the number of straggling workers in each cluster based on the straggler behavior in the previous iteration.
We numerically show that GC-DC provides significant improvements in the average completion time (of each iteration) with no increase in the communication load compared to the original GC scheme.
arXiv Detail & Related papers (2020-11-03T18:52:15Z) - Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad
Samples [67.11669996924671]
We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm.
When updating the generator parameters, we zero out the gradient contributions from the elements of the batch that the critic scores as least realistic'
We show that this top-k update' procedure is a generally applicable improvement.
arXiv Detail & Related papers (2020-02-14T19:27:50Z) - Dual Stochastic Natural Gradient Descent and convergence of interior
half-space gradient approximations [0.0]
Multinomial logistic regression (MLR) is widely used in statistics and machine learning.
gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios.
arXiv Detail & Related papers (2020-01-19T00:53:49Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.