Push-SAGA: A decentralized stochastic algorithm with variance reduction
over directed graphs
- URL: http://arxiv.org/abs/2008.06082v2
- Date: Thu, 22 Oct 2020 20:46:52 GMT
- Title: Push-SAGA: A decentralized stochastic algorithm with variance reduction
over directed graphs
- Authors: Muhammad I. Qureshi and Ran Xin and Soummya Kar and Usman A. Khan
- Abstract summary: Push-SAGA is a decentralized first-order method for a finite first-order method for a directed network of nodes.
We show that Push-SAGA achieves linear convergence for smooth and convex problems.
- Score: 18.53372294049107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose Push-SAGA, a decentralized stochastic first-order
method for finite-sum minimization over a directed network of nodes. Push-SAGA
combines node-level variance reduction to remove the uncertainty caused by
stochastic gradients, network-level gradient tracking to address the
distributed nature of the data, and push-sum consensus to tackle the challenge
of directed communication links. We show that Push-SAGA achieves linear
convergence to the exact solution for smooth and strongly convex problems and
is thus the first linearly-convergent stochastic algorithm over arbitrary
strongly connected directed graphs. We also characterize the regimes in which
Push-SAGA achieves a linear speed-up compared to its centralized counterpart
and achieves a network-independent convergence rate. We illustrate the behavior
and convergence properties of Push-SAGA with the help of numerical experiments
on strongly convex and non-convex problems.
Related papers
- Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - On Arbitrary Compression for Decentralized Consensus and Stochastic
Optimization over Directed Networks [0.6526824510982799]
We propose an iterative-based algorithm that compresses messages according to a desired compression ratio.
Contrary to existing literature allow arbitrary compression ratios.
We show explicit convergence rates for decentralized optimization problems on smooth functions.
arXiv Detail & Related papers (2022-04-18T04:41:56Z) - Variance reduced stochastic optimization over directed graphs with row
and column stochastic weights [18.53372294049107]
AB-SAGA is a finite-sum distributed over strongly convex functions.
We show that for a constant step-size, AB-SAGA converges linearly to the global optimal.
arXiv Detail & Related papers (2022-02-07T16:44:48Z) - Communication-Efficient Federated Learning via Quantized Compressed
Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server.
Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression.
We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Directional Convergence Analysis under Spherically Symmetric
Distribution [21.145823611499104]
We consider the fundamental problem of learning linear predictors (i.e., separable datasets with zero margin) using neural networks with gradient flow or gradient descent.
We show directional convergence guarantees with exact convergence rate for two-layer non-linear networks with only two hidden nodes, and (deep) linear networks.
arXiv Detail & Related papers (2021-05-09T08:59:58Z) - A fast randomized incremental gradient method for decentralized
non-convex optimization [19.540926205375857]
We analyze a single-time randomized method called GT-SAGA GTSAGA for batch non- finite-sum context problems.
We show the almost sure convergence of GT-SAGA to a first-order gradient stationary point.
arXiv Detail & Related papers (2020-11-07T21:30:42Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.