Related papers: Push-SAGA: A decentralized stochastic algorithm with variance reduction over directed graphs

Push-SAGA: A decentralized stochastic algorithm with variance reduction over directed graphs

URL: http://arxiv.org/abs/2008.06082v2
Date: Thu, 22 Oct 2020 20:46:52 GMT
Title: Push-SAGA: A decentralized stochastic algorithm with variance reduction over directed graphs
Authors: Muhammad I. Qureshi and Ran Xin and Soummya Kar and Usman A. Khan
Abstract summary: Push-SAGA is a decentralized first-order method for a finite first-order method for a directed network of nodes. We show that Push-SAGA achieves linear convergence for smooth and convex problems.
Score: 18.53372294049107
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose Push-SAGA, a decentralized stochastic first-order method for finite-sum minimization over a directed network of nodes. Push-SAGA combines node-level variance reduction to remove the uncertainty caused by stochastic gradients, network-level gradient tracking to address the distributed nature of the data, and push-sum consensus to tackle the challenge of directed communication links. We show that Push-SAGA achieves linear convergence to the exact solution for smooth and strongly convex problems and is thus the first linearly-convergent stochastic algorithm over arbitrary strongly connected directed graphs. We also characterize the regimes in which Push-SAGA achieves a linear speed-up compared to its centralized counterpart and achieves a network-independent convergence rate. We illustrate the behavior and convergence properties of Push-SAGA with the help of numerical experiments on strongly convex and non-convex problems.

Related papers

Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
On Arbitrary Compression for Decentralized Consensus and Stochastic Optimization over Directed Networks [0.6526824510982799]
We propose an iterative-based algorithm that compresses messages according to a desired compression ratio. Contrary to existing literature allow arbitrary compression ratios. We show explicit convergence rates for decentralized optimization problems on smooth functions.
arXiv Detail & Related papers (2022-04-18T04:41:56Z)
Variance reduced stochastic optimization over directed graphs with row and column stochastic weights [18.53372294049107]
AB-SAGA is a finite-sum distributed over strongly convex functions. We show that for a constant step-size, AB-SAGA converges linearly to the global optimal.
arXiv Detail & Related papers (2022-02-07T16:44:48Z)
Communication-Efficient Federated Learning via Quantized Compressed Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Directional Convergence Analysis under Spherically Symmetric Distribution [21.145823611499104]
We consider the fundamental problem of learning linear predictors (i.e., separable datasets with zero margin) using neural networks with gradient flow or gradient descent. We show directional convergence guarantees with exact convergence rate for two-layer non-linear networks with only two hidden nodes, and (deep) linear networks.
arXiv Detail & Related papers (2021-05-09T08:59:58Z)
A fast randomized incremental gradient method for decentralized non-convex optimization [19.540926205375857]
We analyze a single-time randomized method called GT-SAGA GTSAGA for batch non- finite-sum context problems. We show the almost sure convergence of GT-SAGA to a first-order gradient stationary point.
arXiv Detail & Related papers (2020-11-07T21:30:42Z)
Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms. Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent. underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.