A fast randomized incremental gradient method for decentralized
non-convex optimization
- URL: http://arxiv.org/abs/2011.03853v3
- Date: Thu, 30 Sep 2021 19:42:31 GMT
- Title: A fast randomized incremental gradient method for decentralized
non-convex optimization
- Authors: Ran Xin and Usman A. Khan and Soummya Kar
- Abstract summary: We analyze a single-time randomized method called GT-SAGA GTSAGA for batch non- finite-sum context problems.
We show the almost sure convergence of GT-SAGA to a first-order gradient stationary point.
- Score: 19.540926205375857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study decentralized non-convex finite-sum minimization problems described
over a network of nodes, where each node possesses a local batch of data
samples. In this context, we analyze a single-timescale randomized incremental
gradient method, called GT-SAGA. GT-SAGA is computationally efficient as it
evaluates one component gradient per node per iteration and achieves provably
fast and robust performance by leveraging node-level variance reduction and
network-level gradient tracking. For general smooth non-convex problems, we
show the almost sure and mean-squared convergence of GT-SAGA to a first-order
stationary point and further describe regimes of practical significance where
it outperforms the existing approaches and achieves a network
topology-independent iteration complexity respectively. When the global
function satisfies the Polyak-Lojaciewisz condition, we show that GT-SAGA
exhibits linear convergence to an optimal solution in expectation and describe
regimes of practical interest where the performance is network
topology-independent and improves upon the existing methods. Numerical
experiments are included to highlight the main convergence aspects of GT-SAGA
in non-convex settings.
Related papers
- Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Improved Overparametrization Bounds for Global Convergence of Stochastic
Gradient Descent for Shallow Neural Networks [1.14219428942199]
We study the overparametrization bounds required for the global convergence of gradient descent algorithm for a class of one hidden layer feed-forward neural networks.
arXiv Detail & Related papers (2022-01-28T11:30:06Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - A Retrospective Approximation Approach for Smooth Stochastic
Optimization [0.2867517731896504]
Gradient (SG) is the defactorandom iterative technique to solve optimization (SO) problems with a smooth (non-fimation) objective $imation.
arXiv Detail & Related papers (2021-03-07T16:29:36Z) - A hybrid variance-reduced method for decentralized stochastic non-convex
optimization [15.447966950703947]
textttGTHSGD algorithm specialized local hybrid gradient implements the network to track the global gradient.
textttGTHSGD achieves a network complexity of$O(n-1)$ when the required error tolerance$epsilon$ is small enough.
arXiv Detail & Related papers (2021-02-12T20:13:05Z) - Push-SAGA: A decentralized stochastic algorithm with variance reduction
over directed graphs [18.53372294049107]
Push-SAGA is a decentralized first-order method for a finite first-order method for a directed network of nodes.
We show that Push-SAGA achieves linear convergence for smooth and convex problems.
arXiv Detail & Related papers (2020-08-13T18:52:17Z) - An improved convergence analysis for decentralized online stochastic
non-convex optimization [17.386715847732468]
In this paper, we show that a technique called GT-Loakjasiewics (GT-Loakjasiewics) satisfies the existing condition GT-Loakjasiewics (GT-Loakjasiewics) satisfies the current best convergence rates.
The results are not only immediately applicable but also the currently known best convergence rates.
arXiv Detail & Related papers (2020-08-10T15:29:13Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.