Exponential Graph is Provably Efficient for Decentralized Deep Training
- URL: http://arxiv.org/abs/2110.13363v1
- Date: Tue, 26 Oct 2021 02:33:39 GMT
- Title: Exponential Graph is Provably Efficient for Decentralized Deep Training
- Authors: Bicheng Ying, Kun Yuan, Yiming Chen, Hanbin Hu, Pan Pan, Wotao Yin
- Abstract summary: We study so-called exponential graphs where every node is connected to $O(log(n)$ neighbors and $n$ is the total number of nodes.
This work proves such graphs can lead to both fast communication and effective averaging simultaneously.
We also discover that a sequence of $log(n)$ one-peer exponential graphs, in which each node communicates to one single neighbor per iteration, can together achieve exact averaging.
- Score: 30.817705471352614
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized SGD is an emerging training method for deep learning known for
its much less (thus faster) communication per iteration, which relaxes the
averaging step in parallel SGD to inexact averaging. The less exact the
averaging is, however, the more the total iterations the training needs to
take. Therefore, the key to making decentralized SGD efficient is to realize
nearly-exact averaging using little communication. This requires a skillful
choice of communication topology, which is an under-studied topic in
decentralized optimization.
In this paper, we study so-called exponential graphs where every node is
connected to $O(\log(n))$ neighbors and $n$ is the total number of nodes. This
work proves such graphs can lead to both fast communication and effective
averaging simultaneously. We also discover that a sequence of $\log(n)$
one-peer exponential graphs, in which each node communicates to one single
neighbor per iteration, can together achieve exact averaging. This favorable
property enables one-peer exponential graph to average as effective as its
static counterpart but communicates more efficiently. We apply these
exponential graphs in decentralized (momentum) SGD to obtain the
state-of-the-art balance between per-iteration communication and iteration
complexity among all commonly-used topologies. Experimental results on a
variety of tasks and models demonstrate that decentralized (momentum) SGD over
exponential graphs promises both fast and high-quality training. Our code is
implemented through BlueFog and available at
https://github.com/Bluefog-Lib/NeurIPS2021-Exponential-Graph.
Related papers
- Distributed Training of Large Graph Neural Networks with Variable Communication Rates [71.7293735221656]
Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements.
Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs.
We introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model.
arXiv Detail & Related papers (2024-06-25T14:57:38Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - Beyond Exponential Graph: Communication-Efficient Topologies for
Decentralized Learning via Finite-time Convergence [31.933103173481964]
We propose a novel topology combining both a fast consensus rate and small maximum degree.
The Base-$(k + 1)$ Graph endows Decentralized SGD (DSGD) with both a faster convergence rate and more communication efficiency than the exponential graph.
arXiv Detail & Related papers (2023-05-19T04:08:07Z) - Communication-Efficient Topologies for Decentralized Learning with
$O(1)$ Consensus Rate [35.698182247676414]
Decentralized optimization is an emerging paradigm in distributed learning in which agents achieve network-wide solutions by peer-to-peer communication without the central server.
We show that the total number of iterations to reach a network-wide solution is affected by the speed at which the agents' information is mixed'' by communication.
We propose a new family of topologies, EquiTopo, which has an (almost) constant degree and a network-size-independent consensus rate.
arXiv Detail & Related papers (2022-10-14T15:02:01Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - BlueFog: Make Decentralized Algorithms Practical for Optimization and
Deep Learning [29.427785235669358]
We introduce BlueFog, a python library for straightforward, high-performance implementations of decentralized algorithms.
Based on a unified abstraction of various communication operations, BlueFog offers intuitive interfaces to implement a spectrum of decentralized algorithms.
BlueFog reaches a much higher throughput and achieves an overall $1.2times sim 1.8times$ speedup over Horovod, a state-of-the-art distributed deep learning package.
arXiv Detail & Related papers (2021-11-08T06:06:39Z) - Accurate, Efficient and Scalable Training of Graph Neural Networks [9.569918335816963]
Graph Neural Networks (GNNs) are powerful deep learning models to generate node embeddings on graphs.
It is still challenging to perform training in an efficient and scalable way.
We propose a novel parallel training framework that reduces training workload by orders of magnitude compared with state-of-the-art minibatch methods.
arXiv Detail & Related papers (2020-10-05T22:06:23Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z) - Quantized Decentralized Stochastic Learning over Directed Graphs [52.94011236627326]
We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph.
As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors.
We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
arXiv Detail & Related papers (2020-02-23T18:25:39Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.