Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs
- URL: http://arxiv.org/abs/2306.05628v1
- Date: Fri, 9 Jun 2023 02:23:37 GMT
- Title: Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs
- Authors: Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li
- Abstract summary: To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill knowledge from a well-trained teacher GNN into a student.
We first quantify the knowledge reliability in GNN by measuring the invariance of their information entropy to noise perturbations.
We propose an effective approach, namely Knowledge-inspired Reliable Distillation (KRD), that models the probability of each node being an informative and reliable knowledge point.
- Score: 42.38007308086495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and
inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill
knowledge from a well-trained teacher GNN into a student MLP. Despite their
great progress, comparatively little work has been done to explore the
reliability of different knowledge points (nodes) in GNNs, especially their
roles played during distillation. In this paper, we first quantify the
knowledge reliability in GNN by measuring the invariance of their information
entropy to noise perturbations, from which we observe that different knowledge
points (1) show different distillation speeds (temporally); (2) are
differentially distributed in the graph (spatially). To achieve reliable
distillation, we propose an effective approach, namely Knowledge-inspired
Reliable Distillation (KRD), that models the probability of each node being an
informative and reliable knowledge point, based on which we sample a set of
additional reliable knowledge points as supervision for training student MLPs.
Extensive experiments show that KRD improves over the vanilla MLPs by 12.62%
and outperforms its corresponding teacher GNNs by 2.16% averaged over 7
datasets and 3 GNN architectures.
Related papers
- Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation [56.912354708167534]
Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs)
GNNto-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student.
This paper proposes a simple yet effective Hardness-aware GNN-to-MLP Distillation (HGMD) framework.
arXiv Detail & Related papers (2024-07-20T06:13:00Z) - A Teacher-Free Graph Knowledge Distillation Framework with Dual
Self-Distillation [58.813991312803246]
We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference.
TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference.
arXiv Detail & Related papers (2024-03-06T05:52:13Z) - VQGraph: Rethinking Graph Representation Space for Bridging GNNs and
MLPs [97.63412451659826]
VQGraph learns a structure-aware tokenizer on graph data that can encode each node's local substructure as a discrete code.
VQGraph achieves new state-of-the-art performance on GNN-to-MLP distillation in both transductive and inductive settings.
arXiv Detail & Related papers (2023-08-04T02:58:08Z) - Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and
Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework [36.160251860788314]
We propose an efficient Full-Frequency GNN-to-MLP (FFG2M) distillation framework.
We factorize the knowledge learned by GNNs into low- and high-frequency components in the spectral domain.
We identify a potential information drowning problem for existing GNN-to-MLP distillation.
arXiv Detail & Related papers (2023-05-18T06:57:06Z) - Graph Neural Networks are Inherently Good Generalizers: Insights by
Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP.
We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z) - Teaching Yourself: Graph Self-Distillation on Neighborhood for Node
Classification [42.840122801915996]
We propose a Graph Self-Distillation on Neighborhood (GSDN) framework to reduce the gap between GNNs and Neurals.
GSDN infers 75XX faster than existing GNNs and 16X-25X faster than other inference acceleration methods.
arXiv Detail & Related papers (2022-10-05T08:35:34Z) - On Self-Distilling Graph Neural Network [64.00508355508106]
We propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD)
The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way.
We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies.
arXiv Detail & Related papers (2020-11-04T12:29:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.