FreeKD: Free-direction Knowledge Distillation for Graph Neural Networks
- URL: http://arxiv.org/abs/2206.06561v4
- Date: Mon, 27 Mar 2023 05:59:30 GMT
- Title: FreeKD: Free-direction Knowledge Distillation for Graph Neural Networks
- Authors: Kaituo Feng, Changsheng Li, Ye Yuan, Guoren Wang
- Abstract summary: It is difficult to train a satisfactory teacher GNN due to the well-known over-parametrized and over-smoothing issues.
We propose the first Free-direction Knowledge Distillation framework via Reinforcement learning for GNNs, called FreeKD.
Our FreeKD is a general and principled framework which can be naturally compatible with GNNs of different architectures.
- Score: 31.980564414833175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation (KD) has demonstrated its effectiveness to boost the
performance of graph neural networks (GNNs), where its goal is to distill
knowledge from a deeper teacher GNN into a shallower student GNN. However, it
is actually difficult to train a satisfactory teacher GNN due to the well-known
over-parametrized and over-smoothing issues, leading to invalid knowledge
transfer in practical applications. In this paper, we propose the first
Free-direction Knowledge Distillation framework via Reinforcement learning for
GNNs, called FreeKD, which is no longer required to provide a deeper
well-optimized teacher GNN. The core idea of our work is to collaboratively
build two shallower GNNs in an effort to exchange knowledge between them via
reinforcement learning in a hierarchical way. As we observe that one typical
GNN model often has better and worse performances at different nodes during
training, we devise a dynamic and free-direction knowledge transfer strategy
that consists of two levels of actions: 1) node-level action determines the
directions of knowledge transfer between the corresponding nodes of two
networks; and then 2) structure-level action determines which of the local
structures generated by the node-level actions to be propagated. In essence,
our FreeKD is a general and principled framework which can be naturally
compatible with GNNs of different architectures. Extensive experiments on five
benchmark datasets demonstrate our FreeKD outperforms two base GNNs in a large
margin, and shows its efficacy to various GNNs. More surprisingly, our FreeKD
has comparable or even better performance than traditional KD algorithms that
distill knowledge from a deeper and stronger teacher GNN.
Related papers
- Collaborative Knowledge Distillation via a Learning-by-Education Node Community [19.54023115706067]
Learning-by-Education Node Community framework (LENC) for Collaborative Knowledge Distillation (CKD) is presented.
LENC addresses the challenges of handling diverse training data distributions and the limitations of individual Deep Neural Network (DNN) node learning abilities.
It achieves state-of-the-art performance in on-line unlabelled CKD.
arXiv Detail & Related papers (2024-09-30T14:22:28Z) - A Teacher-Free Graph Knowledge Distillation Framework with Dual
Self-Distillation [58.813991312803246]
We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference.
TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference.
arXiv Detail & Related papers (2024-03-06T05:52:13Z) - Information Flow in Graph Neural Networks: A Clinical Triage Use Case [49.86931948849343]
Graph Neural Networks (GNNs) have gained popularity in healthcare and other domains due to their ability to process multi-modal and multi-relational graphs.
We investigate how the flow of embedding information within GNNs affects the prediction of links in Knowledge Graphs (KGs)
Our results demonstrate that incorporating domain knowledge into the GNN connectivity leads to better performance than using the same connectivity as the KG or allowing unconstrained embedding propagation.
arXiv Detail & Related papers (2023-09-12T09:18:12Z) - Shared Growth of Graph Neural Networks via Prompted Free-direction
Knowledge Distillation [39.35619721100205]
We propose the first Free-direction Knowledge Distillation framework via reinforcement learning for graph neural networks (GNNs)
Our core idea is to collaboratively learn two shallower GNNs to exchange knowledge between them.
Experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin.
arXiv Detail & Related papers (2023-07-02T10:03:01Z) - Hierarchical Contrastive Learning Enhanced Heterogeneous Graph Neural
Network [59.860534520941485]
Heterogeneous graph neural networks (HGNNs) as an emerging technique have shown superior capacity of dealing with heterogeneous information network (HIN)
Recently, contrastive learning, a self-supervised method, becomes one of the most exciting learning paradigms and shows great potential when there are no labels.
In this paper, we study the problem of self-supervised HGNNs and propose a novel co-contrastive learning mechanism for HGNNs, named HeCo.
arXiv Detail & Related papers (2023-04-24T16:17:21Z) - Boosting Graph Neural Networks via Adaptive Knowledge Distillation [18.651451228086643]
Graph neural networks (GNNs) have shown remarkable performance on diverse graph mining tasks.
Knowledge distillation (KD) is developed to combine the diverse knowledge from multiple models.
We propose a novel adaptive KD framework, called BGNN, which sequentially transfers knowledge from multiple GNNs into a student GNN.
arXiv Detail & Related papers (2022-10-12T04:48:50Z) - Knowledge Enhanced Neural Networks for relational domains [83.9217787335878]
We focus on a specific method, KENN, a Neural-Symbolic architecture that injects prior logical knowledge into a neural network.
In this paper, we propose an extension of KENN for relational data.
arXiv Detail & Related papers (2022-05-31T13:00:34Z) - On Self-Distilling Graph Neural Network [64.00508355508106]
We propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD)
The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way.
We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies.
arXiv Detail & Related papers (2020-11-04T12:29:33Z) - Distilling a Deep Neural Network into a Takagi-Sugeno-Kang Fuzzy
Inference System [9.82399898215447]
Deep neural networks (DNNs) demonstrate great success in classification tasks.
However, they act as black boxes and we don't know how they make decisions in a particular classification task.
We propose to distill the knowledge from a DNN into a fuzzy inference system (FIS), which is Takagi-Sugeno-Kang (TSK)-type in this paper.
arXiv Detail & Related papers (2020-10-10T10:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.