Optimal Complexity in Non-Convex Decentralized Learning over
Time-Varying Networks
- URL: http://arxiv.org/abs/2211.00533v1
- Date: Tue, 1 Nov 2022 15:37:54 GMT
- Title: Optimal Complexity in Non-Convex Decentralized Learning over
Time-Varying Networks
- Authors: Xinmeng Huang and Kun Yuan
- Abstract summary: Decentralized optimization with time-varying networks is an emerging paradigm in machine learning.
It saves remarkable communication overhead in large-scale deep training and is more robust in wireless scenarios especially when nodes are moving.
- Score: 8.860889476382594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized optimization with time-varying networks is an emerging paradigm
in machine learning. It saves remarkable communication overhead in large-scale
deep training and is more robust in wireless scenarios especially when nodes
are moving. Federated learning can also be regarded as decentralized
optimization with time-varying communication patterns alternating between
global averaging and local updates.
While numerous studies exist to clarify its theoretical limits and develop
efficient algorithms, it remains unclear what the optimal complexity is for
non-convex decentralized stochastic optimization over time-varying networks.
The main difficulties lie in how to gauge the effectiveness when transmitting
messages between two nodes via time-varying communications, and how to
establish the lower bound when the network size is fixed (which is a
prerequisite in stochastic optimization). This paper resolves these challenges
and establish the first lower bound complexity. We also develop a new
decentralized algorithm to nearly attain the lower bound, showing the tightness
of the lower bound and the optimality of our algorithm.
Related papers
- From promise to practice: realizing high-performance decentralized training [8.955918346078935]
Decentralized training of deep neural networks has attracted significant attention for its theoretically superior scalability over synchronous data-parallel methods like All-Reduce.
This paper identifies three key factors that can lead to speedups over All-Reduce training and constructs a runtime model to determine when, how, and to what degree decentralization can yield shorter per-it runtimes.
arXiv Detail & Related papers (2024-10-15T19:04:56Z) - Lower Bounds and Optimal Algorithms for Non-Smooth Convex Decentralized Optimization over Time-Varying Networks [57.24087627267086]
We consider the task of minimizing the sum of convex functions stored in a decentralized manner across the nodes of a communication network.
Lower bounds on the number of decentralized communications and (sub)gradient computations required to solve the problem have been established.
We develop the first optimal algorithm that matches these lower bounds and offers substantially improved theoretical performance compared to the existing state of the art.
arXiv Detail & Related papers (2024-05-28T10:28:45Z) - Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic
Update and Transient Iteration Complexity [38.54552875789929]
We introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient complexity.
D-SOBA achieves the state-of-the-art rate, gradient/Hessian complexity, and transient iteration complexity under more relaxed assumptions.
arXiv Detail & Related papers (2024-02-05T16:35:30Z) - Federated Multi-Level Optimization over Decentralized Networks [55.776919718214224]
We study the problem of distributed multi-level optimization over a network, where agents can only communicate with their immediate neighbors.
We propose a novel gossip-based distributed multi-level optimization algorithm that enables networked agents to solve optimization problems at different levels in a single timescale.
Our algorithm achieves optimal sample complexity, scaling linearly with the network size, and demonstrates state-of-the-art performance on various applications.
arXiv Detail & Related papers (2023-10-10T00:21:10Z) - DIGEST: Fast and Communication Efficient Decentralized Learning with Local Updates [4.3707341422218215]
Two widely considered decentralized learning algorithms are Gossip and random walk-based learning.
We design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST.
We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20.
arXiv Detail & Related papers (2023-07-14T22:58:20Z) - DESTRESS: Computation-Optimal and Communication-Efficient Decentralized
Nonconvex Finite-Sum Optimization [43.31016937305845]
Internet-of-things, networked sensing, autonomous systems and federated learning call for decentralized algorithms for finite-sum optimizations.
We develop DEcentralized STochastic REcurSive methodDESTRESS for non finite-sum optimization.
Detailed theoretical and numerical comparisons show that DESTRESS improves upon prior decentralized algorithms.
arXiv Detail & Related papers (2021-10-04T03:17:41Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Community detection using fast low-cardinality semidefinite programming [94.4878715085334]
We propose a new low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from Leiden-k-cut.
This proposed algorithm is scalable, outperforms state-of-the-art algorithms, and outperforms in real-world time with little additional cost.
arXiv Detail & Related papers (2020-12-04T15:46:30Z) - A Low Complexity Decentralized Neural Net with Centralized Equivalence
using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers)
In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns.
We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.