S-ADDOPT: Decentralized stochastic first-order optimization over
directed graphs
- URL: http://arxiv.org/abs/2005.07785v3
- Date: Wed, 22 Jul 2020 19:15:28 GMT
- Title: S-ADDOPT: Decentralized stochastic first-order optimization over
directed graphs
- Authors: Muhammad I. Qureshi, Ran Xin, Soummya Kar, and Usman A. Khan
- Abstract summary: Decentralized convex optimization is proposed to minimize a sum of smooth and strongly cost functions when the functions are distributed over a directed network nodes.
In particular, we propose thetextbftextttS-ADDOPT algorithm that assumes a first-order oracle at each node.
For decaying step-sizes$mathcalO (1/k)$, we show thattextbfttS-ADDOPT reaches the exact solution subly at$mathcalO (1/k)$ and its convergence is networkally-independent
- Score: 16.96562173221624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we study decentralized stochastic optimization to minimize a
sum of smooth and strongly convex cost functions when the functions are
distributed over a directed network of nodes. In contrast to the existing work,
we use gradient tracking to improve certain aspects of the resulting algorithm.
In particular, we propose the~\textbf{\texttt{S-ADDOPT}} algorithm that assumes
a stochastic first-order oracle at each node and show that for a constant
step-size~$\alpha$, each node converges linearly inside an error ball around
the optimal solution, the size of which is controlled by~$\alpha$. For decaying
step-sizes~$\mathcal{O}(1/k)$, we show that~\textbf{\texttt{S-ADDOPT}} reaches
the exact solution sublinearly at~$\mathcal{O}(1/k)$ and its convergence is
asymptotically network-independent. Thus the asymptotic behavior
of~\textbf{\texttt{S-ADDOPT}} is comparable to the centralized stochastic
gradient descent. Numerical experiments over both strongly convex and
non-convex problems illustrate the convergence behavior and the performance
comparison of the proposed algorithm.
Related papers
- Graphon Particle Systems, Part II: Dynamics of Distributed Stochastic Continuum Optimization [5.685037987395183]
We study the distributed optimization problem over a graphon with a continuum of nodes.
We propose gradient descent and gradient tracking algorithms over the graphon.
We show that by choosing the time-varying algorithm gains properly, all nodes' states achieve $mathcalLinfty$-consensus for a connected graphon.
arXiv Detail & Related papers (2024-07-03T02:47:39Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Stochastic Zeroth order Descent with Structured Directions [14.338969001937068]
We introduce and analyze Structured Zeroth order Descent (S-SZD), a finite difference approach which approximates a gradient on a set $lleq d directions, where $d is the dimension of the ambient space.
For convex we prove almost sure convergence bounds and a convergence rate on every $c1/2$, which is arbitrarily close to the one for Gradient Descent (SGD) in terms of number of iterations.
arXiv Detail & Related papers (2022-06-10T14:00:06Z) - Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order
Coordinate Method [3.860616339202303]
This paper investigates the distributed non optimization problem of minimizing a global cost function formed by the summation of $ZOn$ local cost functions.
Agents approximate their own ZO coordinate method to solve the problem.
arXiv Detail & Related papers (2021-03-24T03:07:46Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [5.269633789700637]
We prove that textsf-SGD converges to a limit with the Cram'r-Rao cofactor, for a broad range of step-size choices.
We show that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of textsf-SGD converges to a limit with the Cram'r-Rao cofactor, for a broad range of step-size choices.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Fast decentralized non-convex finite-sum optimization with recursive
variance reduction [19.540926205375857]
We describe a first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique.
In particular, in a big-data regime such that $n = O(Nfrac12(lambda)3)$, this complexitys reduces to $O(Nfrac12Lepsilon-2)$, independent of the network complexity.
In addition, we show appropriate choices of local minibatch size balance the trade-offs between gradient complexity and communication complexity.
arXiv Detail & Related papers (2020-08-17T15:51:32Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.