Related papers: Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence

Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence

URL: http://arxiv.org/abs/2305.11420v2
Date: Sun, 15 Oct 2023 07:43:30 GMT
Title: Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence
Authors: Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada
Abstract summary: We propose a novel topology combining both a fast consensus rate and small maximum degree. The Base-$(k + 1)$ Graph endows Decentralized SGD (DSGD) with both a faster convergence rate and more communication efficiency than the exponential graph.
Score: 31.933103173481964
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponential graph, generally has a large maximum degree, which incurs significant communication costs. Thus, seeking topologies with both a fast consensus rate and small maximum degree is important. In this study, we propose a novel topology combining both a fast consensus rate and small maximum degree called the Base-$(k + 1)$ Graph. Unlike the existing topologies, the Base-$(k + 1)$ Graph enables all nodes to reach the exact consensus after a finite number of iterations for any number of nodes and maximum degree k. Thanks to this favorable property, the Base-$(k + 1)$ Graph endows Decentralized SGD (DSGD) with both a faster convergence rate and more communication efficiency than the exponential graph. We conducted experiments with various topologies, demonstrating that the Base-$(k + 1)$ Graph enables various decentralized learning methods to achieve higher accuracy with better communication efficiency than the existing topologies.

Related papers

Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time [3.1789549088190414]
We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively collaborate the sum of their local cost functions via peer-to-peer communication. We propose a novel, Spanning Tree Push-Pull (STPP), which employs two trees extracted from a general communication graph to distribute both model parameters and parameters.
arXiv Detail & Related papers (2025-03-20T13:11:44Z)
Graph as a feature: improving node classification with non-neural graph-aware logistic regression [2.952177779219163]
Graph-aware Logistic Regression (GLR) is a non-neural model designed for node classification tasks. Unlike traditional graph algorithms that use only a fraction of the information accessible to GNNs, our proposed model simultaneously leverages both node features and the relationships between entities.
arXiv Detail & Related papers (2024-11-19T08:32:14Z)
Strongly Topology-preserving GNNs for Brain Graph Super-resolution [5.563171090433323]
Brain graph super-resolution (SR) is an under-explored yet highly relevant task in network neuroscience. Current SR methods leverage graph neural networks (GNNs) thanks to their ability to handle graph-structured datasets. We develop an efficient mapping from the edge space of our low-resolution (LR) brain graphs to the node space of a high-resolution (HR) dual graph.
arXiv Detail & Related papers (2024-11-01T03:29:04Z)
NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes. The efficient computation is enabled by a kernerlized Gumbel-Softmax operator. Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z)
Beyond spectral gap (extended): The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model. Current theory does not explain that collaboration enables larger learning rates than training alone. This paper aims to paint an accurate picture of sparsely-connected distributed optimization.
arXiv Detail & Related papers (2023-01-05T16:53:38Z)
Discovering the Representation Bottleneck of Graph Neural Networks from Multi-order Interactions [51.597480162777074]
Graph neural networks (GNNs) rely on the message passing paradigm to propagate node features and build interactions. Recent works point out that different graph learning tasks require different ranges of interactions between nodes. We study two common graph construction methods in scientific domains, i.e., emphK-nearest neighbor (KNN) graphs and emphfully-connected (FC) graphs.
arXiv Detail & Related papers (2022-05-15T11:38:14Z)
Effects of Graph Convolutions in Deep Networks [8.937905773981702]
We present a rigorous theoretical understanding of the effects of graph convolutions in multi-layer networks. We show that a single graph convolution expands the regime of the distance between the means where multi-layer networks can classify the data. We provide both theoretical and empirical insights into the performance of graph convolutions placed in different combinations among the layers of a network.
arXiv Detail & Related papers (2022-04-20T08:24:43Z)
Exponential Graph is Provably Efficient for Decentralized Deep Training [30.817705471352614]
We study so-called exponential graphs where every node is connected to $O(log(n)$ neighbors and $n$ is the total number of nodes. This work proves such graphs can lead to both fast communication and effective averaging simultaneously. We also discover that a sequence of $log(n)$ one-peer exponential graphs, in which each node communicates to one single neighbor per iteration, can together achieve exact averaging.
arXiv Detail & Related papers (2021-10-26T02:33:39Z)
ACE-HGNN: Adaptive Curvature Exploration Hyperbolic Graph Neural Network [72.16255675586089]
We propose an Adaptive Curvature Exploration Hyperbolic Graph NeuralNetwork named ACE-HGNN to adaptively learn the optimal curvature according to the input graph and downstream tasks. Experiments on multiple real-world graph datasets demonstrate a significant and consistent performance improvement in model quality with competitive performance and good generalization ability.
arXiv Detail & Related papers (2021-10-15T07:18:57Z)
Multi-hop Graph Convolutional Network with High-order Chebyshev Approximation for Text Reasoning [15.65069702939315]
We define the spectral graph convolutional network with the high-order dynamic Chebyshev approximation (HDGCN) To alleviate the over-smoothing in high-order Chebyshev approximation, a multi-vote-based cross-attention (MVCAttn) with linear computation complexity is also proposed.
arXiv Detail & Related papers (2021-06-08T07:49:43Z)
Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs. In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings. We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)
Quantized Decentralized Stochastic Learning over Directed Graphs [52.94011236627326]
We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors. We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
arXiv Detail & Related papers (2020-02-23T18:25:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.