On Self-Distilling Graph Neural Network
- URL: http://arxiv.org/abs/2011.02255v2
- Date: Fri, 30 Apr 2021 04:31:53 GMT
- Title: On Self-Distilling Graph Neural Network
- Authors: Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, Junzhou Huang
- Abstract summary: We propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD)
The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way.
We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies.
- Score: 64.00508355508106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the teacher-student knowledge distillation framework has
demonstrated its potential in training Graph Neural Networks (GNNs). However,
due to the difficulty of training over-parameterized GNN models, one may not
easily obtain a satisfactory teacher model for distillation. Furthermore, the
inefficient training process of teacher-student knowledge distillation also
impedes its applications in GNN models. In this paper, we propose the first
teacher-free knowledge distillation method for GNNs, termed GNN
Self-Distillation (GNN-SD), that serves as a drop-in replacement of the
standard training process. The method is built upon the proposed neighborhood
discrepancy rate (NDR), which quantifies the non-smoothness of the embedded
graph in an efficient way. Based on this metric, we propose the adaptive
discrepancy retaining (ADR) regularizer to empower the transferability of
knowledge that maintains high neighborhood discrepancy across GNN layers. We
also summarize a generic GNN-SD framework that could be exploited to induce
other distillation strategies. Experiments further prove the effectiveness and
generalization of our approach, as it brings: 1) state-of-the-art GNN
distillation performance with less training cost, 2) consistent and
considerable performance enhancement for various popular backbones.
Related papers
- Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation [56.912354708167534]
Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs)
GNNto-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student.
This paper proposes a simple yet effective Hardness-aware GNN-to-MLP Distillation (HGMD) framework.
arXiv Detail & Related papers (2024-07-20T06:13:00Z) - Self-Distillation Learning Based on Temporal-Spatial Consistency for Spiking Neural Networks [3.7748662901422807]
Spiking neural networks (SNNs) have attracted considerable attention for their event-driven, low-power characteristics and high biological interpretability.
Recent research has improved the performance of the SNN model with a pre-trained teacher model.
In this paper, we explore cost-effective self-distillation learning of SNNs to circumvent these concerns.
arXiv Detail & Related papers (2024-06-12T04:30:40Z) - Online GNN Evaluation Under Test-time Graph Distribution Shifts [92.4376834462224]
A new research problem, online GNN evaluation, aims to provide valuable insights into the well-trained GNNs's ability to generalize to real-world unlabeled graphs.
We develop an effective learning behavior discrepancy score, dubbed LeBeD, to estimate the test-time generalization errors of well-trained GNN models.
arXiv Detail & Related papers (2024-03-15T01:28:08Z) - A Teacher-Free Graph Knowledge Distillation Framework with Dual
Self-Distillation [58.813991312803246]
We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference.
TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference.
arXiv Detail & Related papers (2024-03-06T05:52:13Z) - NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation [19.93322471957759]
We propose a framework called NetDistiller to boost the achievable accuracy of TNNs.
The framework treats them as sub-networks of a weight-sharing teacher constructed by expanding the number of channels of the TNN.
Our code is available at https://github.com/GATECH-EIC/NetDistiller.
arXiv Detail & Related papers (2023-10-24T04:27:51Z) - Label Deconvolution for Node Representation Learning on Large-scale
Attributed Graphs against Learning Bias [75.44877675117749]
We propose an efficient label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs.
Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph datasets Benchmark.
arXiv Detail & Related papers (2023-09-26T13:09:43Z) - Shared Growth of Graph Neural Networks via Prompted Free-direction
Knowledge Distillation [39.35619721100205]
We propose the first Free-direction Knowledge Distillation framework via reinforcement learning for graph neural networks (GNNs)
Our core idea is to collaboratively learn two shallower GNNs to exchange knowledge between them.
Experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin.
arXiv Detail & Related papers (2023-07-02T10:03:01Z) - RELIANT: Fair Knowledge Distillation for Graph Neural Networks [39.22568244059485]
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks.
Knowledge Distillation (KD) is a common solution to compress GNNs.
We propose a principled framework named RELIANT to mitigate the bias exhibited by the student model.
arXiv Detail & Related papers (2023-01-03T15:21:24Z) - Distilling Knowledge from Graph Convolutional Networks [146.71503336770886]
Existing knowledge distillation methods focus on convolutional neural networks (CNNs)
We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model.
We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
arXiv Detail & Related papers (2020-03-23T18:23:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.