Variance reduced stochastic optimization over directed graphs with row
and column stochastic weights
- URL: http://arxiv.org/abs/2202.03346v1
- Date: Mon, 7 Feb 2022 16:44:48 GMT
- Title: Variance reduced stochastic optimization over directed graphs with row
and column stochastic weights
- Authors: Muhammad I. Qureshi, Ran Xin, Soummya Kar, Usman A. Khan
- Abstract summary: AB-SAGA is a finite-sum distributed over strongly convex functions.
We show that for a constant step-size, AB-SAGA converges linearly to the global optimal.
- Score: 18.53372294049107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes AB-SAGA, a first-order distributed stochastic
optimization method to minimize a finite-sum of smooth and strongly convex
functions distributed over an arbitrary directed graph. AB-SAGA removes the
uncertainty caused by the stochastic gradients using a node-level variance
reduction and subsequently employs network-level gradient tracking to address
the data dissimilarity across the nodes. Unlike existing methods that use the
nonlinear push-sum correction to cancel the imbalance caused by the directed
communication, the consensus updates in AB-SAGA are linear and uses both row
and column stochastic weights. We show that for a constant step-size, AB-SAGA
converges linearly to the global optimal. We quantify the directed nature of
the underlying graph using an explicit directivity constant and characterize
the regimes in which AB-SAGA achieves a linear speed-up over its centralized
counterpart. Numerical experiments illustrate the convergence of AB-SAGA for
strongly convex and nonconvex problems.
Related papers
- Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded
Gradients and Affine Variance [46.15915820243487]
We show that AdaGrad-Norm exhibits an order optimal convergence of $mathcalOleft.
We show that AdaGrad-Norm exhibits an order optimal convergence of $mathcalOleft.
arXiv Detail & Related papers (2022-02-11T17:37:54Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Alternating minimization algorithms for graph regularized tensor
completion [8.26185178671935]
We consider a Canonical Polyadic (CP) decomposition approach to low-rank tensor completion (LRTC)
The usage of graph regularization entails benefits in the learning accuracy of LRTC, but at the same time, induces coupling graph Laplacian terms.
We propose efficient alternating minimization algorithms by leveraging the block structure of the underlying CP decomposition-based model.
arXiv Detail & Related papers (2020-08-28T23:20:49Z) - Push-SAGA: A decentralized stochastic algorithm with variance reduction
over directed graphs [18.53372294049107]
Push-SAGA is a decentralized first-order method for a finite first-order method for a directed network of nodes.
We show that Push-SAGA achieves linear convergence for smooth and convex problems.
arXiv Detail & Related papers (2020-08-13T18:52:17Z) - An improved convergence analysis for decentralized online stochastic
non-convex optimization [17.386715847732468]
In this paper, we show that a technique called GT-Loakjasiewics (GT-Loakjasiewics) satisfies the existing condition GT-Loakjasiewics (GT-Loakjasiewics) satisfies the current best convergence rates.
The results are not only immediately applicable but also the currently known best convergence rates.
arXiv Detail & Related papers (2020-08-10T15:29:13Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - S-ADDOPT: Decentralized stochastic first-order optimization over
directed graphs [16.96562173221624]
Decentralized convex optimization is proposed to minimize a sum of smooth and strongly cost functions when the functions are distributed over a directed network nodes.
In particular, we propose thetextbftextttS-ADDOPT algorithm that assumes a first-order oracle at each node.
For decaying step-sizes$mathcalO (1/k)$, we show thattextbfttS-ADDOPT reaches the exact solution subly at$mathcalO (1/k)$ and its convergence is networkally-independent
arXiv Detail & Related papers (2020-05-15T21:14:22Z) - GradientDICE: Rethinking Generalized Offline Estimation of Stationary
Values [75.17074235764757]
We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution.
GenDICE is the state-of-the-art for estimating such density ratios.
arXiv Detail & Related papers (2020-01-29T22:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.