Distilling Knowledge from Graph Convolutional Networks
- URL: http://arxiv.org/abs/2003.10477v4
- Date: Sun, 10 Jan 2021 03:55:05 GMT
- Title: Distilling Knowledge from Graph Convolutional Networks
- Authors: Yiding Yang, Jiayan Qiu, Mingli Song, Dacheng Tao, Xinchao Wang
- Abstract summary: Existing knowledge distillation methods focus on convolutional neural networks (CNNs)
We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model.
We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
- Score: 146.71503336770886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing knowledge distillation methods focus on convolutional neural
networks (CNNs), where the input samples like images lie in a grid domain, and
have largely overlooked graph convolutional networks (GCN) that handle non-grid
data. In this paper, we propose to our best knowledge the first dedicated
approach to distilling knowledge from a pre-trained GCN model. To enable the
knowledge transfer from the teacher GCN to the student, we propose a local
structure preserving module that explicitly accounts for the topological
semantics of the teacher. In this module, the local structure information from
both the teacher and the student are extracted as distributions, and hence
minimizing the distance between these distributions enables topology-aware
knowledge transfer from the teacher, yielding a compact yet high-performance
student model. Moreover, the proposed approach is readily extendable to dynamic
graph models, where the input graphs for the teacher and the student may
differ. We evaluate the proposed method on two different datasets using GCN
models of different architectures, and demonstrate that our method achieves the
state-of-the-art knowledge distillation performance for GCN models. Code is
publicly available at https://github.com/ihollywhy/DistillGCN.PyTorch.
Related papers
- Distilling Knowledge from Self-Supervised Teacher by Embedding Graph
Alignment [52.704331909850026]
We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network.
Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space.
Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
arXiv Detail & Related papers (2022-11-23T19:27:48Z) - Neighborhood Convolutional Network: A New Paradigm of Graph Neural
Networks for Node Classification [12.062421384484812]
Graph Convolutional Network (GCN) decouples neighborhood aggregation and feature transformation in each convolutional layer.
In this paper, we propose a new paradigm of GCN, termed Neighborhood Convolutional Network (NCN)
In this way, the model could inherit the merit of decoupled GCN for aggregating neighborhood information, at the same time, develop much more powerful feature learning modules.
arXiv Detail & Related papers (2022-11-15T02:02:51Z) - Geometric Knowledge Distillation: Topology Compression for Graph Neural
Networks [80.8446673089281]
We study a new paradigm of knowledge transfer that aims at encoding graph topological information into graph neural networks (GNNs)
We propose Neural Heat Kernel (NHK) to encapsulate the geometric property of the underlying manifold concerning the architecture of GNNs.
A fundamental and principled solution is derived by aligning NHKs on teacher and student models, dubbed as Geometric Knowledge Distillation.
arXiv Detail & Related papers (2022-10-24T08:01:58Z) - Compressing Deep Graph Neural Networks via Adversarial Knowledge
Distillation [41.00398052556643]
We propose a novel Adversarial Knowledge Distillation framework for graph models named GraphAKD.
The discriminator distinguishes between teacher knowledge and what the student inherits, while the student GNN works as a generator and aims to fool the discriminator.
The results imply that GraphAKD can precisely transfer knowledge from a complicated teacher GNN to a compact student GNN.
arXiv Detail & Related papers (2022-05-24T00:04:43Z) - Data-Free Adversarial Knowledge Distillation for Graph Neural Networks [62.71646916191515]
We propose the first end-to-end framework for data-free adversarial knowledge distillation on graph structured data (DFAD-GNN)
To be specific, our DFAD-GNN employs a generative adversarial network, which mainly consists of three components: a pre-trained teacher model and a student model are regarded as two discriminators, and a generator is utilized for deriving training graphs to distill knowledge from the teacher model into the student model.
Our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.
arXiv Detail & Related papers (2022-05-08T08:19:40Z) - Graph-Free Knowledge Distillation for Graph Neural Networks [30.38128029453977]
We propose the first dedicated approach to distilling knowledge from a graph neural network without graph data.
The proposed graph-free KD (GFKD) learns graph topology structures for knowledge transfer by modeling them with multinomial distribution.
We provide the strategies for handling different types of prior knowledge in the graph data or the GNNs.
arXiv Detail & Related papers (2021-05-16T21:38:24Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - Extract the Knowledge of Graph Neural Networks and Go Beyond it: An
Effective Knowledge Distillation Framework [42.57467126227328]
We propose a framework based on knowledge distillation to address the issues of semi-supervised learning on graphs.
Our framework extracts the knowledge of an arbitrary learned GNN model (teacher model) and injects it into a well-designed student model.
Experimental results show that the learned student model can consistently outperform its corresponding teacher model by 1.4% - 4.7% on average.
arXiv Detail & Related papers (2021-03-04T08:13:55Z) - Inter-Region Affinity Distillation for Road Marking Segmentation [81.3619453527367]
We study the problem of distilling knowledge from a large deep teacher network to a much smaller student network.
Our method is known as Inter-Region Affinity KD (IntRA-KD)
arXiv Detail & Related papers (2020-04-11T04:26:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.