Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation
- URL: http://arxiv.org/abs/2407.14768v1
- Date: Sat, 20 Jul 2024 06:13:00 GMT
- Title: Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation
- Authors: Lirong Wu, Yunfan Liu, Haitao Lin, Yufei Huang, Stan Z. Li,
- Abstract summary: Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs)
GNNto-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student.
This paper proposes a simple yet effective Hardness-aware GNN-to-MLP Distillation (HGMD) framework.
- Score: 56.912354708167534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To bridge the gaps between powerful Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs), GNN-to-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student MLP. In this paper, we revisit the knowledge samples (nodes) in teacher GNNs from the perspective of hardness, and identify that hard sample distillation may be a major performance bottleneck of existing graph KD algorithms. The GNN-to-MLP KD involves two different types of hardness, one student-free knowledge hardness describing the inherent complexity of GNN knowledge, and the other student-dependent distillation hardness describing the difficulty of teacher-to-student distillation. However, most of the existing work focuses on only one of these aspects or regards them as one thing. This paper proposes a simple yet effective Hardness-aware GNN-to-MLP Distillation (HGMD) framework, which decouples the two hardnesses and estimates them using a non-parametric approach. Finally, two hardness-aware distillation schemes (i.e., HGMD-weight and HGMD-mixup) are further proposed to distill hardness-aware knowledge from teacher GNNs into the corresponding nodes of student MLPs. As non-parametric distillation, HGMD does not involve any additional learnable parameters beyond the student MLPs, but it still outperforms most of the state-of-the-art competitors. HGMD-mixup improves over the vanilla MLPs by 12.95% and outperforms its teacher GNNs by 2.48% averaged over seven real-world datasets.
Related papers
- A Teacher-Free Graph Knowledge Distillation Framework with Dual
Self-Distillation [58.813991312803246]
We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference.
TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference.
arXiv Detail & Related papers (2024-03-06T05:52:13Z) - VQGraph: Rethinking Graph Representation Space for Bridging GNNs and
MLPs [97.63412451659826]
VQGraph learns a structure-aware tokenizer on graph data that can encode each node's local substructure as a discrete code.
VQGraph achieves new state-of-the-art performance on GNN-to-MLP distillation in both transductive and inductive settings.
arXiv Detail & Related papers (2023-08-04T02:58:08Z) - Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs [42.38007308086495]
To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill knowledge from a well-trained teacher GNN into a student.
We first quantify the knowledge reliability in GNN by measuring the invariance of their information entropy to noise perturbations.
We propose an effective approach, namely Knowledge-inspired Reliable Distillation (KRD), that models the probability of each node being an informative and reliable knowledge point.
arXiv Detail & Related papers (2023-06-09T02:23:37Z) - Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and
Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework [36.160251860788314]
We propose an efficient Full-Frequency GNN-to-MLP (FFG2M) distillation framework.
We factorize the knowledge learned by GNNs into low- and high-frequency components in the spectral domain.
We identify a potential information drowning problem for existing GNN-to-MLP distillation.
arXiv Detail & Related papers (2023-05-18T06:57:06Z) - Grouped Knowledge Distillation for Deep Face Recognition [53.57402723008569]
The light-weight student network has difficulty fitting the target logits due to its low model capacity.
We propose a Grouped Knowledge Distillation (GKD) that retains the Primary-KD and Binary-KD but omits Secondary-KD in the ultimate KD loss calculation.
arXiv Detail & Related papers (2023-04-10T09:04:38Z) - RELIANT: Fair Knowledge Distillation for Graph Neural Networks [39.22568244059485]
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks.
Knowledge Distillation (KD) is a common solution to compress GNNs.
We propose a principled framework named RELIANT to mitigate the bias exhibited by the student model.
arXiv Detail & Related papers (2023-01-03T15:21:24Z) - On Self-Distilling Graph Neural Network [64.00508355508106]
We propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD)
The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way.
We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies.
arXiv Detail & Related papers (2020-11-04T12:29:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.