Related papers: Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural Networks

Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural Networks

URL: http://arxiv.org/abs/2205.02468v1
Date: Thu, 5 May 2022 06:48:13 GMT
Title: Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural Networks
Authors: Jiongyu Guo, Defang Chen, Can Wang
Abstract summary: Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline. We propose a novel online knowledge distillation framework to resolve this problem. We develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model.
Score: 6.8080936803807734
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline, where the student model extracts knowledge from a powerful teacher model to improve its performance. However, a pre-trained teacher model is not always accessible due to training cost, privacy, etc. In this paper, we propose a novel online knowledge distillation framework to resolve this problem. Specifically, each student GNN model learns the extracted local structure from another simultaneously trained counterpart in an alternating training procedure. We further develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model, which theoretically makes the structure information spread over all layers. Experimental results on five datasets including PPI, Coauthor-CS/Physics and Amazon-Computer/Photo demonstrate that the student performance is consistently boosted in our collaborative training framework without the supervision of a pre-trained teacher model. In addition, we also find that our alignahead technique can accelerate the model convergence speed and its effectiveness can be generally improved by increasing the student numbers in training. Code is available: https://github.com/GuoJY-eatsTG/Alignahead

Related papers

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them. This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model. OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z)
Frameless Graph Knowledge Distillation [27.831929635701886]
We show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry. Our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.
arXiv Detail & Related papers (2023-07-13T08:56:50Z)
Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment [52.704331909850026]
We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network. Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space. Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
arXiv Detail & Related papers (2022-11-23T19:27:48Z)
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers) As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z)
Online Cross-Layer Knowledge Distillation on Graph Neural Networks with Deep Supervision [6.8080936803807734]
Graph neural networks (GNNs) have become one of the most popular research topics in both academia and industry. Large-scale datasets are posing great challenges for deploying GNNs in edge devices with limited resources. We propose a novel online knowledge distillation framework called Alignahead++ in this paper.
arXiv Detail & Related papers (2022-10-25T03:21:20Z)
Online Adversarial Knowledge Distillation for Graph Neural Networks [25.902263307225816]
Knowledge distillation is used to enhance model generalization in Convolutional Neural Networks (CNNs) In this paper, we propose an online adversarial distillation approach to train a group of graph neural networks.
arXiv Detail & Related papers (2021-12-28T02:30:11Z)
Revisiting Knowledge Distillation: An Inheritance and Exploration Framework [153.73692961660964]
Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model to a student model. We propose a novel inheritance and exploration knowledge distillation framework (IE-KD) Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks.
arXiv Detail & Related papers (2021-07-01T02:20:56Z)
Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework [42.57467126227328]
We propose a framework based on knowledge distillation to address the issues of semi-supervised learning on graphs. Our framework extracts the knowledge of an arbitrary learned GNN model (teacher model) and injects it into a well-designed student model. Experimental results show that the learned student model can consistently outperform its corresponding teacher model by 1.4% - 4.7% on average.
arXiv Detail & Related papers (2021-03-04T08:13:55Z)
Distilling Knowledge from Graph Convolutional Networks [146.71503336770886]
Existing knowledge distillation methods focus on convolutional neural networks (CNNs) We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model. We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
arXiv Detail & Related papers (2020-03-23T18:23:11Z)
Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network. Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.