Diffusion-Assisted Distillation for Self-Supervised Graph Representation Learning with MLPs
- URL: http://arxiv.org/abs/2510.04241v1
- Date: Sun, 05 Oct 2025 15:11:55 GMT
- Title: Diffusion-Assisted Distillation for Self-Supervised Graph Representation Learning with MLPs
- Authors: Seong Jin Ahn, Myoung-Ho Kim,
- Abstract summary: For large-scale applications, there is growing interest in replacing Graph Neural Networks (GNNs) with lightweight Multi-Layer Perceptrons (MLPs)<n>This paper proposes a new distillation method to bridge a huge capacity gap between GNNs and robustnesss in self-supervised graph representation learning.<n>The proposed method employs a denoising diffusion model as a teacher assistant to better distill the knowledge from the teacher GNN into the student.
- Score: 3.595215303316357
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For large-scale applications, there is growing interest in replacing Graph Neural Networks (GNNs) with lightweight Multi-Layer Perceptrons (MLPs) via knowledge distillation. However, distilling GNNs for self-supervised graph representation learning into MLPs is more challenging. This is because the performance of self-supervised learning is more related to the model's inductive bias than supervised learning. This motivates us to design a new distillation method to bridge a huge capacity gap between GNNs and MLPs in self-supervised graph representation learning. In this paper, we propose \textbf{D}iffusion-\textbf{A}ssisted \textbf{D}istillation for \textbf{S}elf-supervised \textbf{G}raph representation learning with \textbf{M}LPs (DAD-SGM). The proposed method employs a denoising diffusion model as a teacher assistant to better distill the knowledge from the teacher GNN into the student MLP. This approach enhances the generalizability and robustness of MLPs in self-supervised graph representation learning. Extensive experiments demonstrate that DAD-SGM effectively distills the knowledge of self-supervised GNNs compared to state-of-the-art GNN-to-MLP distillation methods. Our implementation is available at https://github.com/SeongJinAhn/DAD-SGM.
Related papers
- Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction [61.70012924088756]
Distilling Graph Neural Networks (GNNs) teachers into Multi-Layer Perceptrons (MLPs) students has emerged as an effective approach to achieve strong performance.<n>However, existing distillation methods only use standard GNNs and overlook alternative teachers such as specialized model for link prediction (GNN4LP) and methods (e.g., common neighbors)<n>This paper first explores the impact of different teachers in GNN-to-MLP distillation, we find that stronger teachers do not always produce stronger students, while weaker methods can teachs to near-GNN performance with drastically reduced training costs
arXiv Detail & Related papers (2025-04-08T16:35:11Z) - Teaching MLPs to Master Heterogeneous Graph-Structured Knowledge for Efficient and Accurate Inference [53.38082028252104]
We introduce HG2M and HG2M+ to combine both HGNN's superior performance and relational's efficient inference.<n> HG2M directly trains students with node features as input and soft labels from teacher HGNNs as targets.<n> HG2Ms demonstrate a 379.24$times$ speedup in inference over HGNNs on the large-scale IGB-3M-19 dataset.
arXiv Detail & Related papers (2024-11-21T11:39:09Z) - Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation [56.912354708167534]
Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs)
GNNto-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student.
This paper proposes a simple yet effective Hardness-aware GNN-to-MLP Distillation (HGMD) framework.
arXiv Detail & Related papers (2024-07-20T06:13:00Z) - A Teacher-Free Graph Knowledge Distillation Framework with Dual
Self-Distillation [58.813991312803246]
We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference.
TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference.
arXiv Detail & Related papers (2024-03-06T05:52:13Z) - Propagate & Distill: Towards Effective Graph Learners Using
Propagation-Embracing MLPs [9.731314045194495]
We train a student by knowledge distillation from a teacher graph neural network (GNN)
Inspired by GNNs that separate feature transformation $T$, we re-frame the distillation process as making the student learn both $T$ and $Pi$.
We propose Propagate & Distill (P&D), which propagates the output of the teacher before distillation, which can be interpreted as an approximate process of inverse propagation.
arXiv Detail & Related papers (2023-11-29T16:26:24Z) - Unveiling the Unseen Potential of Graph Learning through MLPs: Effective
Graph Learners Using Propagation-Embracing MLPs [9.731314045194495]
We train a student by knowledge distillation from a teacher neural network (GNN)
Inspired by GNNs that separate transformation $T$ and propagation $Pi$, we re-frame the KD process as enabling the student to explicitly learn both $T$ and $Pi$.
We propose Propagate & Distill (P&D), which propagates the output of the teacher GNN before KD and can be interpreted as an approximate process of the inverse propagation $Pi-1$.
arXiv Detail & Related papers (2023-11-20T13:39:19Z) - VQGraph: Rethinking Graph Representation Space for Bridging GNNs and
MLPs [97.63412451659826]
VQGraph learns a structure-aware tokenizer on graph data that can encode each node's local substructure as a discrete code.
VQGraph achieves new state-of-the-art performance on GNN-to-MLP distillation in both transductive and inductive settings.
arXiv Detail & Related papers (2023-08-04T02:58:08Z) - Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and
Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework [36.160251860788314]
We propose an efficient Full-Frequency GNN-to-MLP (FFG2M) distillation framework.
We factorize the knowledge learned by GNNs into low- and high-frequency components in the spectral domain.
We identify a potential information drowning problem for existing GNN-to-MLP distillation.
arXiv Detail & Related papers (2023-05-18T06:57:06Z) - On Self-Distilling Graph Neural Network [64.00508355508106]
We propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD)
The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way.
We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies.
arXiv Detail & Related papers (2020-11-04T12:29:33Z) - Iterative Graph Self-Distillation [161.04351580382078]
We propose a novel unsupervised graph learning paradigm called Iterative Graph Self-Distillation (IGSD)
IGSD iteratively performs the teacher-student distillation with graph augmentations.
We show that we achieve significant and consistent performance gain on various graph datasets in both unsupervised and semi-supervised settings.
arXiv Detail & Related papers (2020-10-23T18:37:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.