Related papers: Linear Speedup of Incremental Aggregated Gradient Methods on Streaming Data

Linear Speedup of Incremental Aggregated Gradient Methods on Streaming Data

URL: http://arxiv.org/abs/2309.04980v1
Date: Sun, 10 Sep 2023 10:08:52 GMT
Title: Linear Speedup of Incremental Aggregated Gradient Methods on Streaming Data
Authors: Xiaolu Wang, Cheng Jin, Hoi-To Wai, Yuantao Gu
Abstract summary: This paper considers a type of incremental aggregated gradient (IAG) method for large-scale distributed optimization. We show that the streaming IAG method achieves linear speedup when the workers are updating frequently enough.
Score: 38.54333970135826
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper considers a type of incremental aggregated gradient (IAG) method for large-scale distributed optimization. The IAG method is well suited for the parameter server architecture as the latter can easily aggregate potentially staled gradients contributed by workers. Although the convergence of IAG in the case of deterministic gradient is well known, there are only a few results for the case of its stochastic variant based on streaming data. Considering strongly convex optimization, this paper shows that the streaming IAG method achieves linear speedup when the workers are updating frequently enough, even if the data sample distribution across workers are heterogeneous. We show that the expected squared distance to optimal solution decays at O((1+T)/(nt)), where $n$ is the number of workers, t is the iteration number, and T/n is the update frequency of workers. Our analysis involves careful treatments of the conditional expectations with staled gradients and a recursive system with both delayed and noise terms, which are new to the analysis of IAG-type algorithms. Numerical results are presented to verify our findings.

Related papers

AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms [45.90015262911875]
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting. As a by-product of our analysis, we also demonstrate guarantees for gradient-type algorithms such as SGD with random tightness.
arXiv Detail & Related papers (2023-10-31T13:44:53Z)
Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems. We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum. Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z)
Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD) We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z)
Stochastic Reweighted Gradient Descent [4.355567556995855]
We propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient) We pay particular attention to the time and memory overhead of our proposed method. We present empirical results to support our findings.
arXiv Detail & Related papers (2021-03-23T04:09:43Z)
Stochastic Gradient Variance Reduction by Solving a Filtering Problem [0.951828574518325]
Deep neural networks (DNN) are typically using optimized gradient descent (SGD) The estimation of the gradient using samples tends to be noisy and unreliable, resulting in large gradient variance and bad convergence. We propose textbfFilter Gradient Decent(FGD), an efficient optimization algorithm that makes the consistent estimation of gradient.
arXiv Detail & Related papers (2020-12-22T23:48:42Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Gradient Coding with Dynamic Clustering for Straggler Mitigation [57.9123881133818]
GC-DC regulates the number of straggling workers in each cluster based on the straggler behavior in the previous iteration. We numerically show that GC-DC provides significant improvements in the average completion time (of each iteration) with no increase in the communication load compared to the original GC scheme.
arXiv Detail & Related papers (2020-11-03T18:52:15Z)
Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples [67.11669996924671]
We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm. When updating the generator parameters, we zero out the gradient contributions from the elements of the batch that the critic scores as least realistic' We show that this top-k update' procedure is a generally applicable improvement.
arXiv Detail & Related papers (2020-02-14T19:27:50Z)
Dual Stochastic Natural Gradient Descent and convergence of interior half-space gradient approximations [0.0]
Multinomial logistic regression (MLR) is widely used in statistics and machine learning. gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios.
arXiv Detail & Related papers (2020-01-19T00:53:49Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.