GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and
Hardware
- URL: http://arxiv.org/abs/2108.11571v1
- Date: Thu, 26 Aug 2021 04:13:52 GMT
- Title: GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and
Hardware
- Authors: Xin Liu, Mingyu Yan, Shuhan Song, Zhengyang Lv, Wenming Li, Guangyu
Sun, Xiaochun Ye, Dongrui Fan
- Abstract summary: We propose a unified programming model for mainstream sampling algorithms, termed GNNSampler.
We explore the data locality among nodes and their neighbors in real-world datasets for alleviating the irregular memory access in sampling.
Our method is universal to mainstream sampling algorithms and reduces the training time of GNN.
- Score: 8.15489210461058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sampling is a critical operation in the training of Graph Neural Network
(GNN) that helps reduce the cost. Previous works have explored improving
sampling algorithms through mathematical and statistical methods. However,
there is a gap between sampling algorithms and hardware. Without consideration
of hardware, algorithm designers merely optimize sampling at the algorithm
level, missing the great potential of promoting the efficiency of existing
sampling algorithms by leveraging hardware features. In this paper, we first
propose a unified programming model for mainstream sampling algorithms, termed
GNNSampler, covering the key processes for sampling algorithms in various
categories. Second, we explore the data locality among nodes and their
neighbors (i.e., the hardware feature) in real-world datasets for alleviating
the irregular memory access in sampling. Third, we implement locality-aware
optimizations in GNNSampler for diverse sampling algorithms to optimize the
general sampling process in the training of GNN. Finally, we emphatically
conduct experiments on large graph datasets to analyze the relevance between
the training time, model accuracy, and hardware-level metrics, which helps
achieve a good trade-off between time and accuracy in GNN training. Extensive
experimental results show that our method is universal to mainstream sampling
algorithms and reduces the training time of GNN (range from 4.83% with
layer-wise sampling to 44.92% with subgraph-based sampling) with comparable
accuracy.
Related papers
- CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network
for Sampling-Based Path Planning [0.0]
We propose a novel image-based learning algorithm (CBAGAN-RRT) using a Convolutional Block Attention Generative Adversarial Network.
The probability distribution of the paths generated from our GAN model is used to guide the sampling process for the RRT algorithm.
We train and test our network on the dataset generated by citezhang 2021 and demonstrate that our algorithm outperforms the previous state-of-the-art algorithms.
arXiv Detail & Related papers (2023-05-13T20:06:53Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Calibrate and Debias Layer-wise Sampling for Graph Convolutional
Networks [39.56471534442315]
This paper revisits the approach from a matrix approximation perspective.
We propose a new principle for constructing sampling probabilities and an efficient debiasing algorithm.
Improvements are demonstrated by extensive analyses of estimation variance and experiments on common benchmarks.
arXiv Detail & Related papers (2022-06-01T15:52:06Z) - Communication-Efficient Sampling for Distributed Training of Graph
Convolutional Networks [3.075766050800645]
Training Graph Convolutional Networks (GCNs) is expensive as it needs to aggregate data from neighboring nodes.
Previous works have proposed various neighbor sampling methods that estimate the aggregation result based on a small number of sampled neighbors.
We present an algorithm that determines the local sampling probabilities and makes sure our skewed neighbor sampling does not affect much the convergence of the training.
arXiv Detail & Related papers (2021-01-19T16:12:44Z) - A Systematic Characterization of Sampling Algorithms for Open-ended
Language Generation [71.31905141672529]
We study the widely adopted ancestral sampling algorithms for auto-regressive language models.
We identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation.
We find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.
arXiv Detail & Related papers (2020-09-15T17:28:42Z) - Nearly Linear Row Sampling Algorithm for Quantile Regression [54.75919082407094]
We give a row sampling algorithm for the quantile loss function with sample complexity nearly linear in the dimensionality of the data.
Based upon our row sampling algorithm, we give the fastest known algorithm for quantile regression and a graph sparsification algorithm for balanced directed graphs.
arXiv Detail & Related papers (2020-06-15T13:40:07Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Learning with Optimized Random Features: Exponential Speedup by Quantum
Machine Learning without Sparsity and Low-Rank Assumptions [8.08640000394814]
We develop a quantum algorithm for sampling from an optimized distribution over runtime features.
Our algorithm achieves an exponential speedup in $D$ compared to any known classical algorithm for this sampling task.
arXiv Detail & Related papers (2020-04-22T18:00:00Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z) - Channel Assignment in Uplink Wireless Communication using Machine
Learning Approach [54.012791474906514]
This letter investigates a channel assignment problem in uplink wireless communication systems.
Our goal is to maximize the sum rate of all users subject to integer channel assignment constraints.
Due to high computational complexity, machine learning approaches are employed to obtain computational efficient solutions.
arXiv Detail & Related papers (2020-01-12T15:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.