Related papers: An improved convergence analysis for decentralized online stochastic non-convex optimization

An improved convergence analysis for decentralized online stochastic non-convex optimization

URL: http://arxiv.org/abs/2008.04195v2
Date: Tue, 29 Dec 2020 02:20:10 GMT
Title: An improved convergence analysis for decentralized online stochastic non-convex optimization
Authors: Ran Xin, Usman A. Khan, and Soummya Kar
Abstract summary: In this paper, we show that a technique called GT-Loakjasiewics (GT-Loakjasiewics) satisfies the existing condition GT-Loakjasiewics (GT-Loakjasiewics) satisfies the current best convergence rates. The results are not only immediately applicable but also the currently known best convergence rates.
Score: 17.386715847732468
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study decentralized online stochastic non-convex optimization over a network of nodes. Integrating a technique called gradient tracking in decentralized stochastic gradient descent, we show that the resulting algorithm, GT-DSGD, enjoys certain desirable characteristics towards minimizing a sum of smooth non-convex functions. In particular, for general smooth non-convex functions, we establish non-asymptotic characterizations of GT-DSGD and derive the conditions under which it achieves network-independent performances that match the centralized minibatch SGD. In contrast, the existing results suggest that GT-DSGD is always network-dependent and is therefore strictly worse than the centralized minibatch SGD. When the global non-convex function additionally satisfies the Polyak-Lojasiewics (PL) condition, we establish the linear convergence of GT-DSGD up to a steady-state error with appropriate constant step-sizes. Moreover, under stochastic approximation step-sizes, we establish, for the first time, the optimal global sublinear convergence rate on almost every sample path, in addition to the asymptotically optimal sublinear rate in expectation. Since strongly convex functions are a special case of the functions satisfying the PL condition, our results are not only immediately applicable but also improve the currently known best convergence rates and their dependence on problem parameters.

Related papers

Provable Complexity Improvement of AdaGrad over SGD: Upper and Lower Bounds in Stochastic Non-Convex Optimization [18.47705532817026]
Adaptive gradient methods are among the most successful neural network training algorithms.<n>These methods are known to achieve better dimensional dependence than over convex SGD-normarity.<n>In this paper we introduce new assumptions on the smoothness of the structure and the gradient noise variance.
arXiv Detail & Related papers (2024-06-07T02:55:57Z)
Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes [22.022674600775993]
We show that applying adaptive methods directly to distributed minimax problems can result in non-convergence. We propose D-AdaST, a Distributed Distributed minimax method with Tracking Tracking protocol.
arXiv Detail & Related papers (2024-06-05T04:54:36Z)
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z)
On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache. This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z)
Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z)
Tackling benign nonconvexity with smoothing and stochastic gradients [28.197694894254305]
Non- optimization problems are common in machine learning, especially in Deep Learning. In this paper we show how gradient descent can be used to solve such problems.
arXiv Detail & Related papers (2022-02-18T07:27:32Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
A Retrospective Approximation Approach for Smooth Stochastic Optimization [0.2867517731896504]
Gradient (SG) is the defactorandom iterative technique to solve optimization (SO) problems with a smooth (non-fimation) objective $imation.
arXiv Detail & Related papers (2021-03-07T16:29:36Z)
A fast randomized incremental gradient method for decentralized non-convex optimization [19.540926205375857]
We analyze a single-time randomized method called GT-SAGA GTSAGA for batch non- finite-sum context problems. We show the almost sure convergence of GT-SAGA to a first-order gradient stationary point.
arXiv Detail & Related papers (2020-11-07T21:30:42Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation [17.199023009789308]
The Expected assumption of SGD (SGD) is being used routinely for non-artisan functions. In this paper, we show a paradigms for convergence to a smooth non-linear setting. We also provide theoretical guarantees for different step-size conditions.
arXiv Detail & Related papers (2020-06-18T07:05:56Z)
Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses. Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.