Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural
Networks
- URL: http://arxiv.org/abs/2205.02468v1
- Date: Thu, 5 May 2022 06:48:13 GMT
- Title: Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural
Networks
- Authors: Jiongyu Guo, Defang Chen, Can Wang
- Abstract summary: Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline.
We propose a novel online knowledge distillation framework to resolve this problem.
We develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model.
- Score: 6.8080936803807734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing knowledge distillation methods on graph neural networks (GNNs) are
almost offline, where the student model extracts knowledge from a powerful
teacher model to improve its performance. However, a pre-trained teacher model
is not always accessible due to training cost, privacy, etc. In this paper, we
propose a novel online knowledge distillation framework to resolve this
problem. Specifically, each student GNN model learns the extracted local
structure from another simultaneously trained counterpart in an alternating
training procedure. We further develop a cross-layer distillation strategy by
aligning ahead one student layer with the layer in different depth of another
student model, which theoretically makes the structure information spread over
all layers. Experimental results on five datasets including PPI,
Coauthor-CS/Physics and Amazon-Computer/Photo demonstrate that the student
performance is consistently boosted in our collaborative training framework
without the supervision of a pre-trained teacher model. In addition, we also
find that our alignahead technique can accelerate the model convergence speed
and its effectiveness can be generally improved by increasing the student
numbers in training. Code is available:
https://github.com/GuoJY-eatsTG/Alignahead
Related papers
- Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - Frameless Graph Knowledge Distillation [27.831929635701886]
We show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry.
Our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.
arXiv Detail & Related papers (2023-07-13T08:56:50Z) - Distilling Knowledge from Self-Supervised Teacher by Embedding Graph
Alignment [52.704331909850026]
We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network.
Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space.
Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
arXiv Detail & Related papers (2022-11-23T19:27:48Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Online Cross-Layer Knowledge Distillation on Graph Neural Networks with
Deep Supervision [6.8080936803807734]
Graph neural networks (GNNs) have become one of the most popular research topics in both academia and industry.
Large-scale datasets are posing great challenges for deploying GNNs in edge devices with limited resources.
We propose a novel online knowledge distillation framework called Alignahead++ in this paper.
arXiv Detail & Related papers (2022-10-25T03:21:20Z) - Revisiting Knowledge Distillation: An Inheritance and Exploration
Framework [153.73692961660964]
Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model to a student model.
We propose a novel inheritance and exploration knowledge distillation framework (IE-KD)
Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks.
arXiv Detail & Related papers (2021-07-01T02:20:56Z) - Extract the Knowledge of Graph Neural Networks and Go Beyond it: An
Effective Knowledge Distillation Framework [42.57467126227328]
We propose a framework based on knowledge distillation to address the issues of semi-supervised learning on graphs.
Our framework extracts the knowledge of an arbitrary learned GNN model (teacher model) and injects it into a well-designed student model.
Experimental results show that the learned student model can consistently outperform its corresponding teacher model by 1.4% - 4.7% on average.
arXiv Detail & Related papers (2021-03-04T08:13:55Z) - Distilling Knowledge from Graph Convolutional Networks [146.71503336770886]
Existing knowledge distillation methods focus on convolutional neural networks (CNNs)
We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model.
We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
arXiv Detail & Related papers (2020-03-23T18:23:11Z) - Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.
We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network.
Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.