Automated Graph Self-supervised Learning via Multi-teacher Knowledge
Distillation
- URL: http://arxiv.org/abs/2210.02099v1
- Date: Wed, 5 Oct 2022 08:39:13 GMT
- Title: Automated Graph Self-supervised Learning via Multi-teacher Knowledge
Distillation
- Authors: Lirong Wu, Yufei Huang, Haitao Lin, Zicheng Liu, Tianyu Fan, Stan Z.
Li
- Abstract summary: This paper studies the problem of how to automatically, adaptively, and dynamically learn instance-level self-supervised learning strategies for each node.
We propose a novel multi-teacher knowledge distillation framework for Automated Graph Self-Supervised Learning (AGSSL)
Experiments on eight datasets show that AGSSL can benefit from multiple pretext tasks, outperforming the corresponding individual tasks.
- Score: 43.903582264697974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning on graphs has recently achieved remarkable success
in graph representation learning. With hundreds of self-supervised pretext
tasks proposed over the past few years, the research community has greatly
developed, and the key is no longer to design more powerful but complex pretext
tasks, but to make more effective use of those already on hand. This paper
studies the problem of how to automatically, adaptively, and dynamically learn
instance-level self-supervised learning strategies for each node from a given
pool of pretext tasks. In this paper, we propose a novel multi-teacher
knowledge distillation framework for Automated Graph Self-Supervised Learning
(AGSSL), which consists of two main branches: (i) Knowledge Extraction:
training multiple teachers with different pretext tasks, so as to extract
different levels of knowledge with different inductive biases; (ii) Knowledge
Integration: integrating different levels of knowledge and distilling them into
the student model. Without simply treating different teachers as equally
important, we provide a provable theoretical guideline for how to integrate the
knowledge of different teachers, i.e., the integrated teacher probability
should be close to the true Bayesian class-probability. To approach the
theoretical optimum in practice, two adaptive knowledge integration strategies
are proposed to construct a relatively "good" integrated teacher. Extensive
experiments on eight datasets show that AGSSL can benefit from multiple pretext
tasks, outperforming the corresponding individual tasks; by combining a few
simple but classical pretext tasks, the resulting performance is comparable to
other leading counterparts.
Related papers
- Teacher-student curriculum learning for reinforcement learning [1.7259824817932292]
Reinforcement learning (rl) is a popular paradigm for sequential decision making problems.
The sample inefficiency of deep reinforcement learning methods is a significant obstacle when applying rl to real-world problems.
We propose a teacher-student curriculum learning setting where we simultaneously train a teacher that selects tasks for the student while the student learns how to solve the selected task.
arXiv Detail & Related papers (2022-10-31T14:45:39Z) - Generalized Knowledge Distillation via Relationship Matching [53.69235109551099]
Knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks.
Knowledge distillation extracts knowledge from the teacher and integrates it with the target model.
Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space.
arXiv Detail & Related papers (2022-05-04T06:49:47Z) - Self-Supervised Graph Neural Network for Multi-Source Domain Adaptation [51.21190751266442]
Domain adaptation (DA) tries to tackle the scenarios when the test data does not fully follow the same distribution of the training data.
By learning from large-scale unlabeled samples, self-supervised learning has now become a new trend in deep learning.
We propose a novel textbfSelf-textbfSupervised textbfGraph Neural Network (SSG) to enable more effective inter-task information exchange and knowledge sharing.
arXiv Detail & Related papers (2022-04-08T03:37:56Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - Learning Data Teaching Strategies Via Knowledge Tracing [5.648636668261282]
We propose a novel method, called Knowledge Augmented Data Teaching (KADT), to optimize a data teaching strategy for a student model.
The KADT method incorporates a knowledge tracing model to dynamically capture the knowledge progress of a student model in terms of latent learning concepts.
We have evaluated the performance of the KADT method on four different machine learning tasks including knowledge tracing, sentiment analysis, movie recommendation, and image classification.
arXiv Detail & Related papers (2021-11-13T10:10:48Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - Adaptive Multi-Teacher Multi-level Knowledge Distillation [11.722728148523366]
We propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework(AMTML-KD)
It consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights.
As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD.
arXiv Detail & Related papers (2021-03-06T08:18:16Z) - Multi-View Feature Representation for Dialogue Generation with
Bidirectional Distillation [22.14228918338769]
We propose a novel training framework, where the learning of general knowledge is more in line with the idea of reaching consensus.
Our framework effectively improves the model generalization without sacrificing training efficiency.
arXiv Detail & Related papers (2021-02-22T05:23:34Z) - Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning.
In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment.
The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.