GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural
Properties of Graphs
- URL: http://arxiv.org/abs/2401.03114v1
- Date: Sat, 6 Jan 2024 02:59:24 GMT
- Title: GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural
Properties of Graphs
- Authors: Zhongshu Zhu, Bin Jing, Xiaopei Wan, Zhizhen Liu, Lei Liang, Jun zhou
- Abstract summary: We propose GLISP, a sampling based GNN learning system for industrial scale graphs.
GLISP consists of three core components: graph partitioner, graph sampling service and graph inference engine.
Experiments show that GLISP achieves up to $6.53times$ and $70.77times$ speedups over existing GNN systems for training and inference tasks.
- Score: 5.410321469222541
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As a powerful tool for modeling graph data, Graph Neural Networks (GNNs) have
received increasing attention in both academia and industry. Nevertheless, it
is notoriously difficult to deploy GNNs on industrial scale graphs, due to
their huge data size and complex topological structures. In this paper, we
propose GLISP, a sampling based GNN learning system for industrial scale
graphs. By exploiting the inherent structural properties of graphs, such as
power law distribution and data locality, GLISP addresses the scalability and
performance issues that arise at different stages of the graph learning
process. GLISP consists of three core components: graph partitioner, graph
sampling service and graph inference engine. The graph partitioner adopts the
proposed vertex-cut graph partitioning algorithm AdaDNE to produce balanced
partitioning for power law graphs, which is essential for sampling based GNN
systems. The graph sampling service employs a load balancing design that allows
the one hop sampling request of high degree vertices to be handled by multiple
servers. In conjunction with the memory efficient data structure, the
efficiency and scalability are effectively improved. The graph inference engine
splits the $K$-layer GNN into $K$ slices and caches the vertex embeddings
produced by each slice in the data locality aware hybrid caching system for
reuse, thus completely eliminating redundant computation caused by the data
dependency of graph. Extensive experiments show that GLISP achieves up to
$6.53\times$ and $70.77\times$ speedups over existing GNN systems for training
and inference tasks, respectively, and can scale to the graph with over 10
billion vertices and 40 billion edges with limited resources.
Related papers
- Spectral Greedy Coresets for Graph Neural Networks [61.24300262316091]
The ubiquity of large-scale graphs in node-classification tasks hinders the real-world applications of Graph Neural Networks (GNNs)
This paper studies graph coresets for GNNs and avoids the interdependence issue by selecting ego-graphs based on their spectral embeddings.
Our spectral greedy graph coreset (SGGC) scales to graphs with millions of nodes, obviates the need for model pre-training, and applies to low-homophily graphs.
arXiv Detail & Related papers (2024-05-27T17:52:12Z) - Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - HUGE: Huge Unsupervised Graph Embeddings with TPUs [6.108914274067702]
Graph Embedding is a process of creating a continuous representation of nodes in a graph.
A high-performance graph embedding architecture leveraging amounts of high-bandwidth memory is presented.
We verify the embedding space quality on real and synthetic large-scale datasets.
arXiv Detail & Related papers (2023-07-26T20:29:15Z) - Learning Graph Structure from Convolutional Mixtures [119.45320143101381]
We propose a graph convolutional relationship between the observed and latent graphs, and formulate the graph learning task as a network inverse (deconvolution) problem.
In lieu of eigendecomposition-based spectral methods, we unroll and truncate proximal gradient iterations to arrive at a parameterized neural network architecture that we call a Graph Deconvolution Network (GDN)
GDNs can learn a distribution of graphs in a supervised fashion, perform link prediction or edge-weight regression tasks by adapting the loss function, and they are inherently inductive.
arXiv Detail & Related papers (2022-05-19T14:08:15Z) - Imbalanced Graph Classification via Graph-of-Graph Neural Networks [16.589373163769853]
Graph Neural Networks (GNNs) have achieved unprecedented success in learning graph representations to identify categorical labels of graphs.
We introduce a novel framework, Graph-of-Graph Neural Networks (G$2$GNN), which alleviates the graph imbalance issue by deriving extra supervision globally from neighboring graphs and locally from graphs themselves.
Our proposed G$2$GNN outperforms numerous baselines by roughly 5% in both F1-macro and F1-micro scores.
arXiv Detail & Related papers (2021-12-01T02:25:47Z) - LSP : Acceleration and Regularization of Graph Neural Networks via
Locality Sensitive Pruning of Graphs [2.4250821950628234]
Graph Neural Networks (GNNs) have emerged as highly successful tools for graph-related tasks.
Large graphs often involve many redundant components that can be removed without compromising the performance.
We propose a systematic method called Locality-Sensitive Pruning (LSP) for graph pruning based on Locality-Sensitive Hashing.
arXiv Detail & Related papers (2021-11-10T14:12:28Z) - Node Feature Extraction by Self-Supervised Multi-scale Neighborhood
Prediction [123.20238648121445]
We propose a new self-supervised learning framework, Graph Information Aided Node feature exTraction (GIANT)
GIANT makes use of the eXtreme Multi-label Classification (XMC) formalism, which is crucial for fine-tuning the language model based on graph information.
We demonstrate the superior performance of GIANT over the standard GNN pipeline on Open Graph Benchmark datasets.
arXiv Detail & Related papers (2021-10-29T19:55:12Z) - GraphTheta: A Distributed Graph Neural Network Learning System With
Flexible Training Strategy [5.466414428765544]
We present a new distributed graph learning system GraphTheta.
It supports multiple training strategies and enables efficient and scalable learning on big graphs.
This work represents the largest edge-attributed GNN learning task conducted on a billion-scale network in the literature.
arXiv Detail & Related papers (2021-04-21T14:51:33Z) - A Unified Lottery Ticket Hypothesis for Graph Neural Networks [82.31087406264437]
We present a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights.
We further generalize the popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network.
arXiv Detail & Related papers (2021-02-12T21:52:43Z) - MathNet: Haar-Like Wavelet Multiresolution-Analysis for Graph
Representation and Learning [31.42901131602713]
We propose a framework for graph neural networks with multiresolution Haar-like wavelets, or MathNet, with interrelated convolution and pooling strategies.
The proposed MathNet outperforms various existing GNN models, especially on big data sets.
arXiv Detail & Related papers (2020-07-22T05:00:59Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.