Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks
- URL: http://arxiv.org/abs/2302.02922v1
- Date: Mon, 6 Feb 2023 16:54:20 GMT
- Title: Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks
- Authors: Shuai Zhang, Meng Wang, Pin-Yu Chen, Sijia Liu, Songtao Lu, Miao Liu
- Abstract summary: This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
- Score: 89.28881869440433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the significant computational challenge of training large-scale graph
neural networks (GNNs), various sparse learning techniques have been exploited
to reduce memory and storage costs. Examples include \textit{graph
sparsification} that samples a subgraph to reduce the amount of data
aggregation and \textit{model sparsification} that prunes the neural network to
reduce the number of trainable weights. Despite the empirical successes in
reducing the training cost while maintaining the test accuracy, the theoretical
generalization analysis of sparse learning for GNNs remains elusive. To the
best of our knowledge, this paper provides the first theoretical
characterization of joint edge-model sparse learning from the perspective of
sample complexity and convergence rate in achieving zero generalization error.
It proves analytically that both sampling important nodes and pruning neurons
with the lowest-magnitude can reduce the sample complexity and improve
convergence without compromising the test accuracy. Although the analysis is
centered on two-layer GNNs with structural constraints on data, the insights
are applicable to more general setups and justified by both synthetic and
practical citation datasets.
Related papers
- Learning and generalization of one-hidden-layer neural networks, going
beyond standard Gaussian data [14.379261299138147]
This paper analyzes the convergence and iterations of a one-hidden-layer neural network when the input features follow the Gaussian mixture model.
For the first time, this paper characterizes the impact of the input distributions on the sample and the learning rate.
arXiv Detail & Related papers (2022-07-07T23:27:44Z) - Generalization Guarantee of Training Graph Convolutional Networks with
Graph Topology Sampling [83.77955213766896]
Graph convolutional networks (GCNs) have recently achieved great empirical success in learning graphstructured data.
To address its scalability issue, graph topology sampling has been proposed to reduce the memory and computational cost of training Gs.
This paper provides first theoretical justification of graph topology sampling in training (up to) three-layer GCNs.
arXiv Detail & Related papers (2022-07-07T21:25:55Z) - How does unlabeled data improve generalization in self-training? A
one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm.
We prove the benefits of unlabeled data in both training convergence and generalization ability.
Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z) - Tackling Oversmoothing of GNNs with Contrastive Learning [35.88575306925201]
Graph neural networks (GNNs) integrate the comprehensive relation of graph data and representation learning capability.
Oversmoothing makes the final representations of nodes indiscriminative, thus deteriorating the node classification and link prediction performance.
We propose the Topology-guided Graph Contrastive Layer, named TGCL, which is the first de-oversmoothing method maintaining all three mentioned metrics.
arXiv Detail & Related papers (2021-10-26T15:56:16Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - A Revision of Neural Tangent Kernel-based Approaches for Neural Networks [34.75076385561115]
We use the neural tangent kernel to show that networks can fit any finite training sample perfectly.
A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network.
Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
arXiv Detail & Related papers (2020-07-02T05:07:55Z) - Fast Learning of Graph Neural Networks with Guaranteed Generalizability:
One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice.
We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.