An improved convergence analysis for decentralized online stochastic
non-convex optimization
- URL: http://arxiv.org/abs/2008.04195v2
- Date: Tue, 29 Dec 2020 02:20:10 GMT
- Title: An improved convergence analysis for decentralized online stochastic
non-convex optimization
- Authors: Ran Xin, Usman A. Khan, and Soummya Kar
- Abstract summary: In this paper, we show that a technique called GT-Loakjasiewics (GT-Loakjasiewics) satisfies the existing condition GT-Loakjasiewics (GT-Loakjasiewics) satisfies the current best convergence rates.
The results are not only immediately applicable but also the currently known best convergence rates.
- Score: 17.386715847732468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study decentralized online stochastic non-convex
optimization over a network of nodes. Integrating a technique called gradient
tracking in decentralized stochastic gradient descent, we show that the
resulting algorithm, GT-DSGD, enjoys certain desirable characteristics towards
minimizing a sum of smooth non-convex functions. In particular, for general
smooth non-convex functions, we establish non-asymptotic characterizations of
GT-DSGD and derive the conditions under which it achieves network-independent
performances that match the centralized minibatch SGD. In contrast, the
existing results suggest that GT-DSGD is always network-dependent and is
therefore strictly worse than the centralized minibatch SGD. When the global
non-convex function additionally satisfies the Polyak-Lojasiewics (PL)
condition, we establish the linear convergence of GT-DSGD up to a steady-state
error with appropriate constant step-sizes. Moreover, under stochastic
approximation step-sizes, we establish, for the first time, the optimal global
sublinear convergence rate on almost every sample path, in addition to the
asymptotically optimal sublinear rate in expectation. Since strongly convex
functions are a special case of the functions satisfying the PL condition, our
results are not only immediately applicable but also improve the currently
known best convergence rates and their dependence on problem parameters.
Related papers
- Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes [22.022674600775993]
We show that applying adaptive methods directly to distributed minimax problems can result in non-convergence.
We propose D-AdaST, a Distributed Distributed minimax method with Tracking Tracking protocol.
arXiv Detail & Related papers (2024-06-05T04:54:36Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache.
This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z) - Fully Stochastic Trust-Region Sequential Quadratic Programming for
Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints.
The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z) - Tackling benign nonconvexity with smoothing and stochastic gradients [28.197694894254305]
Non- optimization problems are common in machine learning, especially in Deep Learning.
In this paper we show how gradient descent can be used to solve such problems.
arXiv Detail & Related papers (2022-02-18T07:27:32Z) - Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order.
We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z) - A Retrospective Approximation Approach for Smooth Stochastic
Optimization [0.2867517731896504]
Gradient (SG) is the defactorandom iterative technique to solve optimization (SO) problems with a smooth (non-fimation) objective $imation.
arXiv Detail & Related papers (2021-03-07T16:29:36Z) - A fast randomized incremental gradient method for decentralized
non-convex optimization [19.540926205375857]
We analyze a single-time randomized method called GT-SAGA GTSAGA for batch non- finite-sum context problems.
We show the almost sure convergence of GT-SAGA to a first-order gradient stationary point.
arXiv Detail & Related papers (2020-11-07T21:30:42Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and
Interpolation [17.199023009789308]
The Expected assumption of SGD (SGD) is being used routinely for non-artisan functions.
In this paper, we show a paradigms for convergence to a smooth non-linear setting.
We also provide theoretical guarantees for different step-size conditions.
arXiv Detail & Related papers (2020-06-18T07:05:56Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.