On Representation Knowledge Distillation for Graph Neural Networks
- URL: http://arxiv.org/abs/2111.04964v1
- Date: Tue, 9 Nov 2021 06:22:27 GMT
- Title: On Representation Knowledge Distillation for Graph Neural Networks
- Authors: Chaitanya K. Joshi, Fayao Liu, Xu Xun, Jie Lin, Chuan-Sheng Foo
- Abstract summary: We study whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs.
We propose two new approaches which better preserve global topology: (1) Global Structure Preserving loss (GSP) and (2) Graph Contrastive Representation Distillation (G-CRD)
- Score: 15.82821940784549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation is a promising learning paradigm for boosting the
performance and reliability of resource-efficient graph neural networks (GNNs)
using more expressive yet cumbersome teacher models. Past work on distillation
for GNNs proposed the Local Structure Preserving loss (LSP), which matches
local structural relationships across the student and teacher's node embedding
spaces. In this paper, we make two key contributions:
From a methodological perspective, we study whether preserving the global
topology of how the teacher embeds graph data can be a more effective
distillation objective for GNNs, as real-world graphs often contain latent
interactions and noisy edges. The purely local LSP objective over pre-defined
edges is unable to achieve this as it ignores relationships among disconnected
nodes. We propose two new approaches which better preserve global topology: (1)
Global Structure Preserving loss (GSP), which extends LSP to incorporate all
pairwise interactions; and (2) Graph Contrastive Representation Distillation
(G-CRD), which uses contrastive learning to align the student node embeddings
to those of the teacher in a shared representation space.
From an experimental perspective, we introduce an expanded set of benchmarks
on large-scale real-world datasets where the performance gap between teacher
and student GNNs is non-negligible. We believe this is critical for testing the
efficacy and robustness of knowledge distillation, but was missing from the LSP
study which used synthetic datasets with trivial performance gaps. Experiments
across 4 datasets and 14 heterogeneous GNN architectures show that G-CRD
consistently boosts the performance and robustness of lightweight GNN models,
outperforming the structure preserving approaches, LSP and GSP, as well as
baselines adapted from 2D computer vision.
Related papers
- GRE^2-MDCL: Graph Representation Embedding Enhanced via Multidimensional Contrastive Learning [0.0]
Graph representation learning has emerged as a powerful tool for preserving graph topology when mapping nodes to vector representations.
Current graph neural network models face the challenge of requiring extensive labeled data.
We propose Graph Representation Embedding Enhanced via Multidimensional Contrastive Learning.
arXiv Detail & Related papers (2024-09-12T03:09:05Z) - Self-Attention Empowered Graph Convolutional Network for Structure
Learning and Node Embedding [5.164875580197953]
In representation learning on graph-structured data, many popular graph neural networks (GNNs) fail to capture long-range dependencies.
This paper proposes a novel graph learning framework called the graph convolutional network with self-attention (GCN-SA)
The proposed scheme exhibits an exceptional generalization capability in node-level representation learning.
arXiv Detail & Related papers (2024-03-06T05:00:31Z) - Breaking the Entanglement of Homophily and Heterophily in
Semi-supervised Node Classification [25.831508778029097]
We introduce AMUD, which quantifies the relationship between node profiles and topology from a statistical perspective.
We also propose ADPA as a new directed graph learning paradigm for AMUD.
arXiv Detail & Related papers (2023-12-07T07:54:11Z) - Label Deconvolution for Node Representation Learning on Large-scale
Attributed Graphs against Learning Bias [75.44877675117749]
We propose an efficient label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs.
Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph datasets Benchmark.
arXiv Detail & Related papers (2023-09-26T13:09:43Z) - Learning Strong Graph Neural Networks with Weak Information [64.64996100343602]
We develop a principled approach to the problem of graph learning with weak information (GLWI)
We propose D$2$PT, a dual-channel GNN framework that performs long-range information propagation on the input graph with incomplete structure, but also on a global graph that encodes global semantic similarities.
arXiv Detail & Related papers (2023-05-29T04:51:09Z) - T2-GNN: Graph Neural Networks for Graphs with Incomplete Features and
Structure via Teacher-Student Distillation [65.43245616105052]
Graph Neural Networks (GNNs) have been a prevailing technique for tackling various analysis tasks on graph data.
In this paper, we propose a general GNN framework based on teacher-student distillation to improve the performance of GNNs on incomplete graphs.
arXiv Detail & Related papers (2022-12-24T13:49:44Z) - Localized Contrastive Learning on Graphs [110.54606263711385]
We introduce a simple yet effective contrastive model named Localized Graph Contrastive Learning (Local-GCL)
In spite of its simplicity, Local-GCL achieves quite competitive performance in self-supervised node representation learning tasks on graphs with various scales and properties.
arXiv Detail & Related papers (2022-12-08T23:36:00Z) - Tackling Oversmoothing of GNNs with Contrastive Learning [35.88575306925201]
Graph neural networks (GNNs) integrate the comprehensive relation of graph data and representation learning capability.
Oversmoothing makes the final representations of nodes indiscriminative, thus deteriorating the node classification and link prediction performance.
We propose the Topology-guided Graph Contrastive Layer, named TGCL, which is the first de-oversmoothing method maintaining all three mentioned metrics.
arXiv Detail & Related papers (2021-10-26T15:56:16Z) - Self-supervised Graph Learning for Recommendation [69.98671289138694]
We explore self-supervised learning on user-item graph for recommendation.
An auxiliary self-supervised task reinforces node representation learning via self-discrimination.
Empirical studies on three benchmark datasets demonstrate the effectiveness of SGL.
arXiv Detail & Related papers (2020-10-21T06:35:26Z) - Distilling Knowledge from Graph Convolutional Networks [146.71503336770886]
Existing knowledge distillation methods focus on convolutional neural networks (CNNs)
We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model.
We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
arXiv Detail & Related papers (2020-03-23T18:23:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.