Online Cross-Layer Knowledge Distillation on Graph Neural Networks with
Deep Supervision
- URL: http://arxiv.org/abs/2210.13743v1
- Date: Tue, 25 Oct 2022 03:21:20 GMT
- Title: Online Cross-Layer Knowledge Distillation on Graph Neural Networks with
Deep Supervision
- Authors: Jiongyu Guo, Defang Chen, Can Wang
- Abstract summary: Graph neural networks (GNNs) have become one of the most popular research topics in both academia and industry.
Large-scale datasets are posing great challenges for deploying GNNs in edge devices with limited resources.
We propose a novel online knowledge distillation framework called Alignahead++ in this paper.
- Score: 6.8080936803807734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph neural networks (GNNs) have become one of the most popular research
topics in both academia and industry communities for their strong ability in
handling irregular graph data. However, large-scale datasets are posing great
challenges for deploying GNNs in edge devices with limited resources and model
compression techniques have drawn considerable research attention. Existing
model compression techniques such as knowledge distillation (KD) mainly focus
on convolutional neural networks (CNNs). Only limited attempts have been made
recently for distilling knowledge from GNNs in an offline manner. As the
performance of the teacher model does not necessarily improve as the number of
layers increases in GNNs, selecting an appropriate teacher model will require
substantial efforts. To address these challenges, we propose a novel online
knowledge distillation framework called Alignahead++ in this paper.
Alignahead++ transfers structure and feature information in a student layer to
the previous layer of another simultaneously trained student model in an
alternating training procedure. Meanwhile, to avoid over-smoothing problem in
GNNs, deep supervision is employed in Alignahead++ by adding an auxiliary
classifier in each intermediate layer to prevent the collapse of the node
feature embeddings. Experimental results on four datasets including PPI, Cora,
PubMed and CiteSeer demonstrate that the student performance is consistently
boosted in our collaborative training framework without the supervision of a
pre-trained teacher model and its effectiveness can generally be improved by
increasing the number of students.
Related papers
- DFA-GNN: Forward Learning of Graph Neural Networks by Direct Feedback Alignment [57.62885438406724]
Graph neural networks are recognized for their strong performance across various applications.
BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks.
We propose DFA-GNN, a novel forward learning framework tailored for GNNs with a case study of semi-supervised learning.
arXiv Detail & Related papers (2024-06-04T07:24:51Z) - Label Deconvolution for Node Representation Learning on Large-scale
Attributed Graphs against Learning Bias [75.44877675117749]
We propose an efficient label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs.
Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph datasets Benchmark.
arXiv Detail & Related papers (2023-09-26T13:09:43Z) - Distributed Graph Neural Network Training: A Survey [51.77035975191926]
Graph neural networks (GNNs) are a type of deep learning models that are trained on graphs and have been successfully applied in various domains.
Despite the effectiveness of GNNs, it is still challenging for GNNs to efficiently scale to large graphs.
As a remedy, distributed computing becomes a promising solution of training large-scale GNNs.
arXiv Detail & Related papers (2022-11-01T01:57:00Z) - On-Device Domain Generalization [93.79736882489982]
Domain generalization is critical to on-device machine learning applications.
We find that knowledge distillation is a strong candidate for solving the problem.
We propose a simple idea called out-of-distribution knowledge distillation (OKD), which aims to teach the student how the teacher handles (synthetic) out-of-distribution data.
arXiv Detail & Related papers (2022-09-15T17:59:31Z) - Compressing Deep Graph Neural Networks via Adversarial Knowledge
Distillation [41.00398052556643]
We propose a novel Adversarial Knowledge Distillation framework for graph models named GraphAKD.
The discriminator distinguishes between teacher knowledge and what the student inherits, while the student GNN works as a generator and aims to fool the discriminator.
The results imply that GraphAKD can precisely transfer knowledge from a complicated teacher GNN to a compact student GNN.
arXiv Detail & Related papers (2022-05-24T00:04:43Z) - Data-Free Adversarial Knowledge Distillation for Graph Neural Networks [62.71646916191515]
We propose the first end-to-end framework for data-free adversarial knowledge distillation on graph structured data (DFAD-GNN)
To be specific, our DFAD-GNN employs a generative adversarial network, which mainly consists of three components: a pre-trained teacher model and a student model are regarded as two discriminators, and a generator is utilized for deriving training graphs to distill knowledge from the teacher model into the student model.
Our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.
arXiv Detail & Related papers (2022-05-08T08:19:40Z) - Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural
Networks [6.8080936803807734]
Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline.
We propose a novel online knowledge distillation framework to resolve this problem.
We develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model.
arXiv Detail & Related papers (2022-05-05T06:48:13Z) - Graph-Free Knowledge Distillation for Graph Neural Networks [30.38128029453977]
We propose the first dedicated approach to distilling knowledge from a graph neural network without graph data.
The proposed graph-free KD (GFKD) learns graph topology structures for knowledge transfer by modeling them with multinomial distribution.
We provide the strategies for handling different types of prior knowledge in the graph data or the GNNs.
arXiv Detail & Related papers (2021-05-16T21:38:24Z) - Attentive Graph Neural Networks for Few-Shot Learning [74.01069516079379]
Graph Neural Networks (GNN) has demonstrated the superior performance in many challenging applications, including the few-shot learning tasks.
Despite its powerful capacity to learn and generalize the model from few samples, GNN usually suffers from severe over-fitting and over-smoothing as the model becomes deep.
We propose a novel Attentive GNN to tackle these challenges, by incorporating a triple-attention mechanism.
arXiv Detail & Related papers (2020-07-14T07:43:09Z) - Distilling Spikes: Knowledge Distillation in Spiking Neural Networks [22.331135708302586]
Spiking Neural Networks (SNNs) are energy-efficient computing architectures that exchange spikes for processing information.
We propose techniques for knowledge distillation in spiking neural networks for the task of image classification.
Our approach is expected to open up new avenues for deploying high performing large SNN models on resource-constrained hardware platforms.
arXiv Detail & Related papers (2020-05-01T09:36:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.