Distilling Knowledge from Self-Supervised Teacher by Embedding Graph
Alignment
- URL: http://arxiv.org/abs/2211.13264v1
- Date: Wed, 23 Nov 2022 19:27:48 GMT
- Title: Distilling Knowledge from Self-Supervised Teacher by Embedding Graph
Alignment
- Authors: Yuchen Ma, Yanbei Chen, Zeynep Akata
- Abstract summary: We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network.
Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space.
Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
- Score: 52.704331909850026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances have indicated the strengths of self-supervised pre-training
for improving representation learning on downstream tasks. Existing works often
utilize self-supervised pre-trained models by fine-tuning on downstream tasks.
However, fine-tuning does not generalize to the case when one needs to build a
customized model architecture different from the self-supervised model. In this
work, we formulate a new knowledge distillation framework to transfer the
knowledge from self-supervised pre-trained models to any other student network
by a novel approach named Embedding Graph Alignment. Specifically, inspired by
the spirit of instance discrimination in self-supervised learning, we model the
instance-instance relations by a graph formulation in the feature embedding
space and distill the self-supervised teacher knowledge to a student network by
aligning the teacher graph and the student graph. Our distillation scheme can
be flexibly applied to transfer the self-supervised knowledge to enhance
representation learning on various student networks. We demonstrate that our
model outperforms multiple representative knowledge distillation methods on
three benchmark datasets, including CIFAR100, STL10, and TinyImageNet. Code is
here: https://github.com/yccm/EGA.
Related papers
- Variational Graph Auto-Encoder Based Inductive Learning Method for Semi-Supervised Classification [10.497590357666114]
We propose the Self-Label Augmented VGAE model for inductive graph representation learning.
To leverage the label information for training, our model takes node labels as one-hot encoded inputs and then performs label reconstruction in model training.
Our proposed model archives promise results on node classification with particular superiority under semi-supervised learning settings.
arXiv Detail & Related papers (2024-03-26T08:59:37Z) - DreamTeacher: Pretraining Image Backbones with Deep Generative Models [103.62397699392346]
We introduce a self-supervised feature representation learning framework that utilizes generative networks for pre-training downstream image backbones.
We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet.
We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board.
arXiv Detail & Related papers (2023-07-14T17:17:17Z) - Frameless Graph Knowledge Distillation [27.831929635701886]
We show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry.
Our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.
arXiv Detail & Related papers (2023-07-13T08:56:50Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural
Networks [6.8080936803807734]
Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline.
We propose a novel online knowledge distillation framework to resolve this problem.
We develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model.
arXiv Detail & Related papers (2022-05-05T06:48:13Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z) - Iterative Graph Self-Distillation [161.04351580382078]
We propose a novel unsupervised graph learning paradigm called Iterative Graph Self-Distillation (IGSD)
IGSD iteratively performs the teacher-student distillation with graph augmentations.
We show that we achieve significant and consistent performance gain on various graph datasets in both unsupervised and semi-supervised settings.
arXiv Detail & Related papers (2020-10-23T18:37:06Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z) - Distilling Knowledge from Graph Convolutional Networks [146.71503336770886]
Existing knowledge distillation methods focus on convolutional neural networks (CNNs)
We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model.
We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
arXiv Detail & Related papers (2020-03-23T18:23:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.