Rethinking Efficiency and Redundancy in Training Large-scale Graphs
- URL: http://arxiv.org/abs/2209.00800v1
- Date: Fri, 2 Sep 2022 03:25:32 GMT
- Title: Rethinking Efficiency and Redundancy in Training Large-scale Graphs
- Authors: Xin Liu, Xunbin Xiong, Mingyu Yan, Runzhen Xue, Shirui Pan, Xiaochun
Ye, Dongrui Fan
- Abstract summary: We argue that redundancy exists in large-scale graphs and will degrade the training efficiency.
Despite recent advances in sampling-based training methods, sampling-based GNNs generally overlook the redundancy issue.
We propose DropReef to detect and drop the redundancy in large-scale graphs once and for all.
- Score: 26.982614602436655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale graphs are ubiquitous in real-world scenarios and can be trained
by Graph Neural Networks (GNNs) to generate representation for downstream
tasks. Given the abundant information and complex topology of a large-scale
graph, we argue that redundancy exists in such graphs and will degrade the
training efficiency. Unfortunately, the model scalability severely restricts
the efficiency of training large-scale graphs via vanilla GNNs. Despite recent
advances in sampling-based training methods, sampling-based GNNs generally
overlook the redundancy issue. It still takes intolerable time to train these
models on large-scale graphs. Thereby, we propose to drop redundancy and
improve efficiency of training large-scale graphs with GNNs, by rethinking the
inherent characteristics in a graph.
In this paper, we pioneer to propose a once-for-all method, termed DropReef,
to drop the redundancy in large-scale graphs. Specifically, we first conduct
preliminary experiments to explore potential redundancy in large-scale graphs.
Next, we present a metric to quantify the neighbor heterophily of all nodes in
a graph. Based on both experimental and theoretical analysis, we reveal the
redundancy in a large-scale graph, i.e., nodes with high neighbor heterophily
and a great number of neighbors. Then, we propose DropReef to detect and drop
the redundancy in large-scale graphs once and for all, helping reduce the
training time while ensuring no sacrifice in the model accuracy. To demonstrate
the effectiveness of DropReef, we apply it to recent state-of-the-art
sampling-based GNNs for training large-scale graphs, owing to the high
precision of such models. With DropReef leveraged, the training efficiency of
models can be greatly promoted. DropReef is highly compatible and is offline
performed, benefiting the state-of-the-art sampling-based GNNs in the present
and future to a significant extent.
Related papers
- Faster Inference Time for GNNs using coarsening [1.323700980948722]
coarsening-based methods are used to reduce the graph into a smaller one, resulting in faster computation.
No previous research has tackled the cost during the inference.
This paper presents a novel approach to improve the scalability of GNNs through subgraph-based techniques.
arXiv Detail & Related papers (2024-10-19T06:27:24Z) - Do We Really Need Graph Convolution During Training? Light Post-Training Graph-ODE for Efficient Recommendation [34.93725892725111]
Graph convolution networks (GCNs) in training recommender systems (RecSys) have been persistent concerns.
This paper presents a critical examination of the necessity of graph convolutions during the training phase.
We introduce an innovative alternative: the Light Post-Training Graph Ordinary-Differential-Equation (LightGODE)
arXiv Detail & Related papers (2024-07-26T17:59:32Z) - Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning [7.448831299106425]
DISGEN is a model-agnostic framework designed to disentangle size factors from graph representations.
Our empirical results show that DISGEN outperforms the state-of-the-art models by up to 6% on real-world datasets.
arXiv Detail & Related papers (2024-06-07T03:19:24Z) - Graph Unlearning with Efficient Partial Retraining [28.433619085748447]
Graph Neural Networks (GNNs) have achieved remarkable success in various real-world applications.
GNNs may be trained on undesirable graph data, which can degrade their performance and reliability.
We propose GraphRevoker, a novel graph unlearning framework that better maintains the model utility of unlearnable GNNs.
arXiv Detail & Related papers (2024-03-12T06:22:10Z) - Learning to Reweight for Graph Neural Network [63.978102332612906]
Graph Neural Networks (GNNs) show promising results for graph tasks.
Existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data.
We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability.
arXiv Detail & Related papers (2023-12-19T12:25:10Z) - Fast and Effective GNN Training with Linearized Random Spanning Trees [20.73637495151938]
We present a new effective and scalable framework for training GNNs in node classification tasks.
Our approach progressively refines the GNN weights on an extensive sequence of random spanning trees.
The sparse nature of these path graphs substantially lightens the computational burden of GNN training.
arXiv Detail & Related papers (2023-06-07T23:12:42Z) - A Comprehensive Study on Large-Scale Graph Training: Benchmarking and
Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs)
We present a new ensembling training manner, named EnGCN, to address the existing issues.
Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z) - Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural
Networks [52.566735716983956]
We propose a graph gradual pruning framework termed CGP to dynamically prune GNNs.
Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs.
Our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
arXiv Detail & Related papers (2022-07-18T14:23:31Z) - Scaling R-GCN Training with Graph Summarization [71.06855946732296]
Training of Relation Graph Convolutional Networks (R-GCN) does not scale well with the size of the graph.
In this work, we experiment with the use of graph summarization techniques to compress the graph.
We obtain reasonable results on the AIFB, MUTAG and AM datasets.
arXiv Detail & Related papers (2022-03-05T00:28:43Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - OOD-GNN: Out-of-Distribution Generalized Graph Neural Network [73.67049248445277]
Graph neural networks (GNNs) have achieved impressive performance when testing and training graph data come from identical distribution.
Existing GNNs lack out-of-distribution generalization abilities so that their performance substantially degrades when there exist distribution shifts between testing and training graph data.
We propose an out-of-distribution generalized graph neural network (OOD-GNN) for achieving satisfactory performance on unseen testing graphs that have different distributions with training graphs.
arXiv Detail & Related papers (2021-12-07T16:29:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.