ADOM: Accelerated Decentralized Optimization Method for Time-Varying
Networks
- URL: http://arxiv.org/abs/2102.09234v1
- Date: Thu, 18 Feb 2021 09:37:20 GMT
- Title: ADOM: Accelerated Decentralized Optimization Method for Time-Varying
Networks
- Authors: Dmitry Kovalev, Egor Shulgin, Peter Richt\'arik, Alexander Rogozin,
Alexander Gasnikov
- Abstract summary: We propose ADOM - an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks.
Up to a constant factor, which depends on the network structure only, its communication is the same as that of accelerated Nesterov method.
- Score: 124.33353902111939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose ADOM - an accelerated method for smooth and strongly convex
decentralized optimization over time-varying networks. ADOM uses a dual oracle,
i.e., we assume access to the gradient of the Fenchel conjugate of the
individual loss functions. Up to a constant factor, which depends on the
network structure only, its communication complexity is the same as that of
accelerated Nesterov gradient method (Nesterov, 2003). To the best of our
knowledge, only the algorithm of Rogozin et al. (2019) has a convergence rate
with similar properties. However, their algorithm converges under the very
restrictive assumption that the number of network changes can not be greater
than a tiny percentage of the number of iterations. This assumption is hard to
satisfy in practice, as the network topology changes usually can not be
controlled. In contrast, ADOM merely requires the network to stay connected
throughout time.
Related papers
- Decentralized Optimization in Time-Varying Networks with Arbitrary Delays [22.40154714677385]
We consider a decentralized optimization problem for networks affected by communication delays.
Examples of such networks include collaborative machine learning, sensor networks, and multi-agent systems.
To mimic communication delays, we add virtual non-computing nodes to the network, resulting in directed graphs.
arXiv Detail & Related papers (2024-05-29T20:51:38Z) - Decentralized Stochastic Proximal Gradient Descent with Variance
Reduction over Time-varying Networks [30.231314171218994]
In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives.
We propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique.
arXiv Detail & Related papers (2021-12-20T08:23:36Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Distributed Optimization, Averaging via ADMM, and Network Topology [0.0]
We study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.
We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence.
arXiv Detail & Related papers (2020-09-05T21:44:39Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.