Communication-Efficient Sampling for Distributed Training of Graph
Convolutional Networks
- URL: http://arxiv.org/abs/2101.07706v1
- Date: Tue, 19 Jan 2021 16:12:44 GMT
- Title: Communication-Efficient Sampling for Distributed Training of Graph
Convolutional Networks
- Authors: Peng Jiang, Masuma Akter Rumi
- Abstract summary: Training Graph Convolutional Networks (GCNs) is expensive as it needs to aggregate data from neighboring nodes.
Previous works have proposed various neighbor sampling methods that estimate the aggregation result based on a small number of sampled neighbors.
We present an algorithm that determines the local sampling probabilities and makes sure our skewed neighbor sampling does not affect much the convergence of the training.
- Score: 3.075766050800645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training Graph Convolutional Networks (GCNs) is expensive as it needs to
aggregate data recursively from neighboring nodes. To reduce the computation
overhead, previous works have proposed various neighbor sampling methods that
estimate the aggregation result based on a small number of sampled neighbors.
Although these methods have successfully accelerated the training, they mainly
focus on the single-machine setting. As real-world graphs are large, training
GCNs in distributed systems is desirable. However, we found that the existing
neighbor sampling methods do not work well in a distributed setting.
Specifically, a naive implementation may incur a huge amount of communication
of feature vectors among different machines. To address this problem, we
propose a communication-efficient neighbor sampling method in this work. Our
main idea is to assign higher sampling probabilities to the local nodes so that
remote nodes are accessed less frequently. We present an algorithm that
determines the local sampling probabilities and makes sure our skewed neighbor
sampling does not affect much the convergence of the training. Our experiments
with node classification benchmarks show that our method significantly reduces
the communication overhead for distributed GCN training with little accuracy
loss.
Related papers
- Learning conditional distributions on continuous spaces [0.0]
We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes.
We employ two distinct clustering schemes: one based on a fixed-radius ball and the other on nearest neighbors.
We propose to incorporate the nearest neighbors method into neural network training, as our empirical analysis indicates it has better performance in practice.
arXiv Detail & Related papers (2024-06-13T17:53:47Z) - Faster Convergence with Less Communication: Broadcast-Based Subgraph
Sampling for Decentralized Learning over Wireless Networks [32.914407967052114]
$texttBASS$ is a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD.
We show that $texttBASS$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods.
arXiv Detail & Related papers (2024-01-24T20:00:23Z) - Communication-Efficient Decentralized Federated Learning via One-Bit
Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications.
Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging.
We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z) - A Maximum Log-Likelihood Method for Imbalanced Few-Shot Learning Tasks [3.2895195535353308]
We propose a new maximum log-likelihood metric for few-shot architectures.
We demonstrate that the proposed metric achieves superior performance accuracy w.r.t. conventional similarity metrics.
We also show that our algorithm achieves state-of-the-art transductive few-shot performance when the evaluation data is imbalanced.
arXiv Detail & Related papers (2022-11-26T21:31:00Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal
Sample and Communication Complexities for Federated Learning [58.6792963686231]
Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data.
It is not clear how to choose the WNs' minimum update directions, the first minibatch sizes, and the local update frequency.
We show that there is a trade-off curve between local update frequencies and local mini sizes, on which the above complexities can be maintained.
arXiv Detail & Related papers (2021-06-19T06:13:45Z) - Improving Generative Adversarial Networks with Local Coordinate Coding [150.24880482480455]
Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution.
In practice, semantic information might be represented by some latent distribution learned from data.
We propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data.
arXiv Detail & Related papers (2020-07-28T09:17:50Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - One Size Fits All: Can We Train One Denoiser for All Noise Levels? [13.46272057205994]
It is often preferred to train one neural network estimator and apply it to all noise levels.
The de facto protocol is to train the estimator with noisy samples whose noise are uniformly distributed.
This paper addresses the sample problem from a minimax risk optimization perspective.
arXiv Detail & Related papers (2020-05-19T17:56:04Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.