Related papers: A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation

A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation

URL: http://arxiv.org/abs/2403.03483v1
Date: Wed, 6 Mar 2024 05:52:13 GMT
Title: A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation
Authors: Lirong Wu, Haitao Lin, Zhangyang Gao, Guojiang Zhao, Stan Z. Li
Abstract summary: We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference. TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference.
Score: 58.813991312803246
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent years have witnessed great success in handling graph-related tasks with Graph Neural Networks (GNNs). Despite their great academic success, Multi-Layer Perceptrons (MLPs) remain the primary workhorse for practical industrial applications. One reason for such an academic-industry gap is the neighborhood-fetching latency incurred by data dependency in GNNs. To reduce their gaps, Graph Knowledge Distillation (GKD) is proposed, usually based on a standard teacher-student architecture, to distill knowledge from a large teacher GNN into a lightweight student GNN or MLP. However, we found in this paper that neither teachers nor GNNs are necessary for graph knowledge distillation. We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference. More importantly, the proposed TGS framework is purely based on MLPs, where structural information is only implicitly used to guide dual knowledge self-distillation between the target node and its neighborhood. As a result, TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference. Extensive experiments have shown that the performance of vanilla MLPs can be greatly improved with dual self-distillation, e.g., TGS improves over vanilla MLPs by 15.54% on average and outperforms state-of-the-art GKD algorithms on six real-world datasets. In terms of inference speed, TGS infers 75X-89X faster than existing GNNs and 16X-25X faster than classical inference acceleration methods.

Related papers

Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction [61.70012924088756]
Distilling Graph Neural Networks (GNNs) teachers into Multi-Layer Perceptrons (MLPs) students has emerged as an effective approach to achieve strong performance. However, existing distillation methods only use standard GNNs and overlook alternative teachers such as specialized model for link prediction (GNN4LP) and methods (e.g., common neighbors) This paper first explores the impact of different teachers in GNN-to-MLP distillation, we find that stronger teachers do not always produce stronger students, while weaker methods can teachs to near-GNN performance with drastically reduced training costs
arXiv Detail & Related papers (2025-04-08T16:35:11Z)
Teaching MLPs to Master Heterogeneous Graph-Structured Knowledge for Efficient and Accurate Inference [53.38082028252104]
We introduce HG2M and HG2M+ to combine both HGNN's superior performance and relational's efficient inference. HG2M directly trains students with node features as input and soft labels from teacher HGNNs as targets. HG2Ms demonstrate a 379.24$times$ speedup in inference over HGNNs on the large-scale IGB-3M-19 dataset.
arXiv Detail & Related papers (2024-11-21T11:39:09Z)
VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs [97.63412451659826]
VQGraph learns a structure-aware tokenizer on graph data that can encode each node's local substructure as a discrete code. VQGraph achieves new state-of-the-art performance on GNN-to-MLP distillation in both transductive and inductive settings.
arXiv Detail & Related papers (2023-08-04T02:58:08Z)
Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework [36.160251860788314]
We propose an efficient Full-Frequency GNN-to-MLP (FFG2M) distillation framework. We factorize the knowledge learned by GNNs into low- and high-frequency components in the spectral domain. We identify a potential information drowning problem for existing GNN-to-MLP distillation.
arXiv Detail & Related papers (2023-05-18T06:57:06Z)
Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs [39.78378636099604]
We propose Prototype-Guided Knowledge Distillation (PGKD), which does not require graph edges (edge-free setting) yet learns structure-awares. Specifically, we first employ the class prototypes to analyze the impact of graph structures on graph Neural Networks (GNNs) Then design two losses to distill such information from GNNs to benchmarks.
arXiv Detail & Related papers (2023-03-24T02:28:55Z)
RELIANT: Fair Knowledge Distillation for Graph Neural Networks [39.22568244059485]
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. Knowledge Distillation (KD) is a common solution to compress GNNs. We propose a principled framework named RELIANT to mitigate the bias exhibited by the student model.
arXiv Detail & Related papers (2023-01-03T15:21:24Z)
Teaching Yourself: Graph Self-Distillation on Neighborhood for Node Classification [42.840122801915996]
We propose a Graph Self-Distillation on Neighborhood (GSDN) framework to reduce the gap between GNNs and Neurals. GSDN infers 75XX faster than existing GNNs and 16X-25X faster than other inference acceleration methods.
arXiv Detail & Related papers (2022-10-05T08:35:34Z)
Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation [34.676755383361005]
Graph-less Neural Networks (GLNNs) have no inference graph dependency. We show that GLNNs with competitive performance infer faster than GNNs by 146X-273X and faster than other acceleration methods by 14X-27X. A comprehensive analysis of GLNN shows when and why GLNN can achieve competitive results to Gs and suggests GLNN as a handy choice for latency-constrained applications.
arXiv Detail & Related papers (2021-10-17T05:16:58Z)
On Self-Distilling Graph Neural Network [64.00508355508106]
We propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD) The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way. We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies.
arXiv Detail & Related papers (2020-11-04T12:29:33Z)
GPT-GNN: Generative Pre-Training of Graph Neural Networks [93.35945182085948]
Graph neural networks (GNNs) have been demonstrated to be powerful in modeling graph-structured data. We present the GPT-GNN framework to initialize GNNs by generative pre-training. We show that GPT-GNN significantly outperforms state-of-the-art GNN models without pre-training by up to 9.1% across various downstream tasks.
arXiv Detail & Related papers (2020-06-27T20:12:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.