Related papers: On Self-Distilling Graph Neural Network

On Self-Distilling Graph Neural Network

URL: http://arxiv.org/abs/2011.02255v2
Date: Fri, 30 Apr 2021 04:31:53 GMT
Title: On Self-Distilling Graph Neural Network
Authors: Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, Junzhou Huang
Abstract summary: We propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD) The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way. We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies.
Score: 64.00508355508106
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, the teacher-student knowledge distillation framework has demonstrated its potential in training Graph Neural Networks (GNNs). However, due to the difficulty of training over-parameterized GNN models, one may not easily obtain a satisfactory teacher model for distillation. Furthermore, the inefficient training process of teacher-student knowledge distillation also impedes its applications in GNN models. In this paper, we propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD), that serves as a drop-in replacement of the standard training process. The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way. Based on this metric, we propose the adaptive discrepancy retaining (ADR) regularizer to empower the transferability of knowledge that maintains high neighborhood discrepancy across GNN layers. We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies. Experiments further prove the effectiveness and generalization of our approach, as it brings: 1) state-of-the-art GNN distillation performance with less training cost, 2) consistent and considerable performance enhancement for various popular backbones.

Related papers

Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction [61.70012924088756]
Distilling Graph Neural Networks (GNNs) teachers into Multi-Layer Perceptrons (MLPs) students has emerged as an effective approach to achieve strong performance. However, existing distillation methods only use standard GNNs and overlook alternative teachers such as specialized model for link prediction (GNN4LP) and methods (e.g., common neighbors) This paper first explores the impact of different teachers in GNN-to-MLP distillation, we find that stronger teachers do not always produce stronger students, while weaker methods can teachs to near-GNN performance with drastically reduced training costs
arXiv Detail & Related papers (2025-04-08T16:35:11Z)
Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks [12.411930097352158]
Spiking Neural Networks (SNNs) offer significant computational efficiency through discrete spike-based information transfer. Despite their potential to reduce inference energy consumption, a performance gap persists between SNNs and Artificial Neural Networks (ANNs) We propose a novel logit distillation method characterized by temporal separation and entropy regularization.
arXiv Detail & Related papers (2025-03-05T03:37:41Z)
Fine-tuning is Not Fine: Mitigating Backdoor Attacks in GNNs with Limited Clean Data [51.745219224707384]
Graph Neural Networks (GNNs) have achieved remarkable performance through their message-passing mechanism. Recent studies have highlighted the vulnerability of GNNs to backdoor attacks. In this paper, we propose a practical backdoor mitigation framework, denoted as GRAPHNAD.
arXiv Detail & Related papers (2025-01-10T10:16:35Z)
Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation [56.912354708167534]
Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs) GNNto-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student. This paper proposes a simple yet effective Hardness-aware GNN-to-MLP Distillation (HGMD) framework.
arXiv Detail & Related papers (2024-07-20T06:13:00Z)
Self-Distillation Learning Based on Temporal-Spatial Consistency for Spiking Neural Networks [3.7748662901422807]
Spiking neural networks (SNNs) have attracted considerable attention for their event-driven, low-power characteristics and high biological interpretability. Recent research has improved the performance of the SNN model with a pre-trained teacher model. In this paper, we explore cost-effective self-distillation learning of SNNs to circumvent these concerns.
arXiv Detail & Related papers (2024-06-12T04:30:40Z)
Online GNN Evaluation Under Test-time Graph Distribution Shifts [92.4376834462224]
A new research problem, online GNN evaluation, aims to provide valuable insights into the well-trained GNNs's ability to generalize to real-world unlabeled graphs. We develop an effective learning behavior discrepancy score, dubbed LeBeD, to estimate the test-time generalization errors of well-trained GNN models.
arXiv Detail & Related papers (2024-03-15T01:28:08Z)
A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation [58.813991312803246]
We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference. TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference.
arXiv Detail & Related papers (2024-03-06T05:52:13Z)
NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation [19.93322471957759]
We propose a framework called NetDistiller to boost the achievable accuracy of TNNs. The framework treats them as sub-networks of a weight-sharing teacher constructed by expanding the number of channels of the TNN. Our code is available at https://github.com/GATECH-EIC/NetDistiller.
arXiv Detail & Related papers (2023-10-24T04:27:51Z)
Label Deconvolution for Node Representation Learning on Large-scale Attributed Graphs against Learning Bias [75.44877675117749]
We propose an efficient label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs. Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph datasets Benchmark.
arXiv Detail & Related papers (2023-09-26T13:09:43Z)
Shared Growth of Graph Neural Networks via Prompted Free-direction Knowledge Distillation [39.35619721100205]
We propose the first Free-direction Knowledge Distillation framework via reinforcement learning for graph neural networks (GNNs) Our core idea is to collaboratively learn two shallower GNNs to exchange knowledge between them. Experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin.
arXiv Detail & Related papers (2023-07-02T10:03:01Z)
RELIANT: Fair Knowledge Distillation for Graph Neural Networks [39.22568244059485]
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. Knowledge Distillation (KD) is a common solution to compress GNNs. We propose a principled framework named RELIANT to mitigate the bias exhibited by the student model.
arXiv Detail & Related papers (2023-01-03T15:21:24Z)
Distilling Knowledge from Graph Convolutional Networks [146.71503336770886]
Existing knowledge distillation methods focus on convolutional neural networks (CNNs) We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model. We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
arXiv Detail & Related papers (2020-03-23T18:23:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.