Iterative Graph Self-Distillation
- URL: http://arxiv.org/abs/2010.12609v2
- Date: Fri, 13 Aug 2021 09:29:07 GMT
- Title: Iterative Graph Self-Distillation
- Authors: Hanlin Zhang, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan
Liang, Eric P. Xing
- Abstract summary: We propose a novel unsupervised graph learning paradigm called Iterative Graph Self-Distillation (IGSD)
IGSD iteratively performs the teacher-student distillation with graph augmentations.
We show that we achieve significant and consistent performance gain on various graph datasets in both unsupervised and semi-supervised settings.
- Score: 161.04351580382078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to discriminatively vectorize graphs is a fundamental challenge that
attracts increasing attentions in recent years. Inspired by the recent success
of unsupervised contrastive learning, we aim to learn graph-level
representation in an unsupervised manner. Specifically, we propose a novel
unsupervised graph learning paradigm called Iterative Graph Self-Distillation
(IGSD) which iteratively performs the teacher-student distillation with graph
augmentations. Different from conventional knowledge distillation, IGSD
constructs the teacher with an exponential moving average of the student model
and distills the knowledge of itself. The intuition behind IGSD is to predict
the teacher network representation of the graph pairs under different augmented
views. As a natural extension, we also apply IGSD to semi-supervised scenarios
by jointly regularizing the network with both supervised and unsupervised
contrastive loss. Finally, we show that finetuning the IGSD-trained models with
self-training can further improve the graph representation power. Empirically,
we achieve significant and consistent performance gain on various graph
datasets in both unsupervised and semi-supervised settings, which well
validates the superiority of IGSD.
Related papers
- Disentangled Generative Graph Representation Learning [51.59824683232925]
This paper introduces DiGGR (Disentangled Generative Graph Representation Learning), a self-supervised learning framework.
It aims to learn latent disentangled factors and utilize them to guide graph mask modeling.
Experiments on 11 public datasets for two different graph learning tasks demonstrate that DiGGR consistently outperforms many previous self-supervised methods.
arXiv Detail & Related papers (2024-08-24T05:13:02Z) - Isomorphic-Consistent Variational Graph Auto-Encoders for Multi-Level
Graph Representation Learning [9.039193854524763]
We propose the Isomorphic-Consistent VGAE (IsoC-VGAE) for task-agnostic graph representation learning.
We first devise a decoding scheme to provide a theoretical guarantee of keeping the isomorphic consistency.
We then propose the Inverse Graph Neural Network (Inv-GNN) decoder as its intuitive realization.
arXiv Detail & Related papers (2023-12-09T10:16:53Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - Spectral Augmentations for Graph Contrastive Learning [50.149996923976836]
Contrastive learning has emerged as a premier method for learning representations with or without supervision.
Recent studies have shown its utility in graph representation learning for pre-training.
We propose a set of well-motivated graph transformation operations to provide a bank of candidates when constructing augmentations for a graph contrastive objective.
arXiv Detail & Related papers (2023-02-06T16:26:29Z) - Coarse-to-Fine Contrastive Learning on Graphs [38.41992365090377]
A variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner.
We introduce a self-ranking paradigm to ensure that the discriminative information among different nodes can be maintained.
Experiment results on various benchmark datasets verify the effectiveness of our algorithm.
arXiv Detail & Related papers (2022-12-13T08:17:20Z) - Self-supervised Representation Learning on Electronic Health Records
with Graph Kernel Infomax [4.133378723518227]
We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR.
Unlike the state-of-the-art, we do not change the graph structure to construct augmented views.
Our approach yields performance on clinical downstream tasks that exceeds the state-of-the-art.
arXiv Detail & Related papers (2022-09-01T16:15:08Z) - Latent Augmentation For Better Graph Self-Supervised Learning [20.082614919182692]
We argue that predictive models weaponed with latent augmentations and powerful decoder could achieve comparable or even better representation power than contrastive models.
A novel graph decoder named Wiener Graph Deconvolutional Network is correspondingly designed to perform information reconstruction from augmented latent representations.
arXiv Detail & Related papers (2022-06-26T17:41:59Z) - Graph Self-supervised Learning with Accurate Discrepancy Learning [64.69095775258164]
We propose a framework that aims to learn the exact discrepancy between the original and the perturbed graphs, coined as Discrepancy-based Self-supervised LeArning (D-SLA)
We validate our method on various graph-related downstream tasks, including molecular property prediction, protein function prediction, and link prediction tasks, on which our model largely outperforms relevant baselines.
arXiv Detail & Related papers (2022-02-07T08:04:59Z) - Towards Unsupervised Deep Graph Structure Learning [67.58720734177325]
We propose an unsupervised graph structure learning paradigm, where the learned graph topology is optimized by data itself without any external guidance.
Specifically, we generate a learning target from the original data as an "anchor graph", and use a contrastive loss to maximize the agreement between the anchor graph and the learned graph.
arXiv Detail & Related papers (2022-01-17T11:57:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.