A hybrid variance-reduced method for decentralized stochastic non-convex
optimization
- URL: http://arxiv.org/abs/2102.06752v1
- Date: Fri, 12 Feb 2021 20:13:05 GMT
- Title: A hybrid variance-reduced method for decentralized stochastic non-convex
optimization
- Authors: Ran Xin and Usman A. Khan and Soummya Kar
- Abstract summary: textttGTHSGD algorithm specialized local hybrid gradient implements the network to track the global gradient.
textttGTHSGD achieves a network complexity of$O(n-1)$ when the required error tolerance$epsilon$ is small enough.
- Score: 15.447966950703947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers decentralized stochastic optimization over a network
of~$n$ nodes, where each node possesses a smooth non-convex local cost function
and the goal of the networked nodes is to find an~$\epsilon$-accurate
first-order stationary point of the sum of the local costs. We focus on an
online setting, where each node accesses its local cost only by means of a
stochastic first-order oracle that returns a noisy version of the exact
gradient. In this context, we propose a novel single-loop decentralized hybrid
variance-reduced stochastic gradient method, called \texttt{GT-HSGD}, that
outperforms the existing approaches in terms of both the oracle complexity and
practical implementation. The \texttt{GT-HSGD} algorithm implements specialized
local hybrid stochastic gradient estimators that are fused over the network to
track the global gradient. Remarkably, \texttt{GT-HSGD} achieves a
network-independent oracle complexity of~$O(n^{-1}\epsilon^{-3})$ when the
required error tolerance~$\epsilon$ is small enough, leading to a linear
speedup with respect to the centralized optimal online variance-reduced
approaches that operate on a single node. Numerical experiments are provided to
illustrate our main technical results.
Related papers
- Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization [8.670873561640903]
We investigate the finite-time analysis of finding ($delta,,ilon$)-stationary points for nonsmooth nonsmooth objectives in decentralized settings.
$O is the first finite-time guarantee for decentralized nonsmooth optimization.
arXiv Detail & Related papers (2024-06-03T16:09:34Z) - Decentralized Riemannian Algorithm for Nonconvex Minimax Problems [82.50374560598493]
The minimax algorithms for neural networks have been developed to solve many problems.
In this paper, we propose two types of minimax algorithms.
For the setting, we propose DRSGDA and prove that our method achieves a gradient.
arXiv Detail & Related papers (2023-02-08T01:42:45Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - Decentralized Multi-Task Stochastic Optimization With Compressed
Communications [22.31884634659446]
The paper develops algorithms and obtains performance bounds for two different models of local information availability at the nodes.
We show that deviation from the global minimum value and violations of the constraints are upper-bounded by $mathcalO(T-frac12)$ and $mathcalO(T-frac14)$.
arXiv Detail & Related papers (2021-12-23T05:54:42Z) - Decentralized Stochastic Proximal Gradient Descent with Variance
Reduction over Time-varying Networks [30.231314171218994]
In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives.
We propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique.
arXiv Detail & Related papers (2021-12-20T08:23:36Z) - On Stochastic Moving-Average Estimators for Non-Convex Optimization [105.22760323075008]
In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Fast decentralized non-convex finite-sum optimization with recursive
variance reduction [19.540926205375857]
We describe a first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique.
In particular, in a big-data regime such that $n = O(Nfrac12(lambda)3)$, this complexitys reduces to $O(Nfrac12Lepsilon-2)$, independent of the network complexity.
In addition, we show appropriate choices of local minibatch size balance the trade-offs between gradient complexity and communication complexity.
arXiv Detail & Related papers (2020-08-17T15:51:32Z) - Fully Asynchronous Policy Evaluation in Distributed Reinforcement
Learning over Networks [14.636457985379746]
This paper proposes a emphfully asynchronous scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks.
Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors.
arXiv Detail & Related papers (2020-03-01T08:12:08Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.