Graph-less Neural Networks: Teaching Old MLPs New Tricks via
Distillation
- URL: http://arxiv.org/abs/2110.08727v1
- Date: Sun, 17 Oct 2021 05:16:58 GMT
- Title: Graph-less Neural Networks: Teaching Old MLPs New Tricks via
Distillation
- Authors: Shichang Zhang, Yozen Liu, Yizhou Sun, Neil Shah
- Abstract summary: Graph-less Neural Networks (GLNNs) have no inference graph dependency.
We show that GLNNs with competitive performance infer faster than GNNs by 146X-273X and faster than other acceleration methods by 14X-27X.
A comprehensive analysis of GLNN shows when and why GLNN can achieve competitive results to Gs and suggests GLNN as a handy choice for latency-constrained applications.
- Score: 34.676755383361005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph Neural Networks (GNNs) have recently become popular for graph machine
learning and have shown great results on wide node classification tasks. Yet,
GNNs are less popular for practical deployments in the industry owing to their
scalability challenges incurred by data dependency. Namely, GNN inference
depends on neighbor nodes multiple hops away from the target, and fetching
these nodes burdens latency-constrained applications. Existing inference
acceleration methods like pruning and quantization can speed up GNNs to some
extent by reducing Multiplication-and-ACcumulation (MAC) operations. However,
their improvements are limited given the data dependency is not resolved.
Conversely, multi-layer perceptrons (MLPs) have no dependency on graph data and
infer much faster than GNNs, even though they are less accurate than GNNs for
node classification in general. Motivated by these complementary strengths and
weaknesses, we bring GNNs and MLPs together via knowledge distillation (KD).
Our work shows that the performance of MLPs can be improved by large margins
with GNN KD. We call the distilled MLPs Graph-less Neural Networks (GLNNs) as
they have no inference graph dependency. We show that GLNN with competitive
performance infer faster than GNNs by 146X-273X and faster than other
acceleration methods by 14X-27X. Meanwhile, under a production setting
involving both transductive and inductive predictions across 7 datasets, GLNN
accuracies improve over stand alone MLPs by 12.36% on average and match GNNs on
6/7 datasets. A comprehensive analysis of GLNN shows when and why GLNN can
achieve competitive results to GNNs and suggests GLNN as a handy choice for
latency-constrained applications.
Related papers
- AdaGMLP: AdaBoosting GNN-to-MLP Knowledge Distillation [15.505402580010104]
A new wave of methods, collectively known as GNN-to-MLP Knowledge Distillation, has emerged.
They aim to transfer GNN-learned knowledge to a more efficient student.
These methods face challenges in situations with insufficient training data and incomplete test data.
We propose AdaGMLP, an AdaBoosting GNN-to-MLP Knowledge Distillation framework.
arXiv Detail & Related papers (2024-05-23T08:28:44Z) - A Teacher-Free Graph Knowledge Distillation Framework with Dual
Self-Distillation [58.813991312803246]
We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference.
TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference.
arXiv Detail & Related papers (2024-03-06T05:52:13Z) - Information Flow in Graph Neural Networks: A Clinical Triage Use Case [49.86931948849343]
Graph Neural Networks (GNNs) have gained popularity in healthcare and other domains due to their ability to process multi-modal and multi-relational graphs.
We investigate how the flow of embedding information within GNNs affects the prediction of links in Knowledge Graphs (KGs)
Our results demonstrate that incorporating domain knowledge into the GNN connectivity leads to better performance than using the same connectivity as the KG or allowing unconstrained embedding propagation.
arXiv Detail & Related papers (2023-09-12T09:18:12Z) - Graph Neural Networks are Inherently Good Generalizers: Insights by
Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP.
We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z) - Distributed Graph Neural Network Training: A Survey [51.77035975191926]
Graph neural networks (GNNs) are a type of deep learning models that are trained on graphs and have been successfully applied in various domains.
Despite the effectiveness of GNNs, it is still challenging for GNNs to efficiently scale to large graphs.
As a remedy, distributed computing becomes a promising solution of training large-scale GNNs.
arXiv Detail & Related papers (2022-11-01T01:57:00Z) - Teaching Yourself: Graph Self-Distillation on Neighborhood for Node
Classification [42.840122801915996]
We propose a Graph Self-Distillation on Neighborhood (GSDN) framework to reduce the gap between GNNs and Neurals.
GSDN infers 75XX faster than existing GNNs and 16X-25X faster than other inference acceleration methods.
arXiv Detail & Related papers (2022-10-05T08:35:34Z) - Optimization of Graph Neural Networks: Implicit Acceleration by Skip
Connections and More Depth [57.10183643449905]
Graph Neural Networks (GNNs) have been studied from the lens of expressive power and generalization.
We study the dynamics of GNNs by studying deep skip optimization.
Our results provide first theoretical support for the success of GNNs.
arXiv Detail & Related papers (2021-05-10T17:59:01Z) - A Unified Lottery Ticket Hypothesis for Graph Neural Networks [82.31087406264437]
We present a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights.
We further generalize the popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network.
arXiv Detail & Related papers (2021-02-12T21:52:43Z) - On the Bottleneck of Graph Neural Networks and its Practical
Implications [22.704284264177108]
We show that graph neural networks (GNNs) are susceptible to a bottleneck when aggregating messages across a long path.
This bottleneck causes the over-squashing of exponentially growing information into fixed-size vectors.
GNNs fail to propagate messages originating from distant nodes and perform poorly when the prediction task depends on long-range interaction.
arXiv Detail & Related papers (2020-06-09T12:04:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.