Related papers: Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment

Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment

URL: http://arxiv.org/abs/2211.13264v1
Date: Wed, 23 Nov 2022 19:27:48 GMT
Title: Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment
Authors: Yuchen Ma, Yanbei Chen, Zeynep Akata
Abstract summary: We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network. Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space. Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
Score: 52.704331909850026
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances have indicated the strengths of self-supervised pre-training for improving representation learning on downstream tasks. Existing works often utilize self-supervised pre-trained models by fine-tuning on downstream tasks. However, fine-tuning does not generalize to the case when one needs to build a customized model architecture different from the self-supervised model. In this work, we formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network by a novel approach named Embedding Graph Alignment. Specifically, inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space and distill the self-supervised teacher knowledge to a student network by aligning the teacher graph and the student graph. Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks. We demonstrate that our model outperforms multiple representative knowledge distillation methods on three benchmark datasets, including CIFAR100, STL10, and TinyImageNet. Code is here: https://github.com/yccm/EGA.

Related papers

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Variational Graph Auto-Encoder Based Inductive Learning Method for Semi-Supervised Classification [10.497590357666114]
We propose the Self-Label Augmented VGAE model for inductive graph representation learning. To leverage the label information for training, our model takes node labels as one-hot encoded inputs and then performs label reconstruction in model training. Our proposed model archives promise results on node classification with particular superiority under semi-supervised learning settings.
arXiv Detail & Related papers (2024-03-26T08:59:37Z)
DreamTeacher: Pretraining Image Backbones with Deep Generative Models [103.62397699392346]
We introduce a self-supervised feature representation learning framework that utilizes generative networks for pre-training downstream image backbones. We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet. We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board.
arXiv Detail & Related papers (2023-07-14T17:17:17Z)
Frameless Graph Knowledge Distillation [27.831929635701886]
We show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry. Our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.
arXiv Detail & Related papers (2023-07-13T08:56:50Z)
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR) We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z)
Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural Networks [6.8080936803807734]
Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline. We propose a novel online knowledge distillation framework to resolve this problem. We develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model.
arXiv Detail & Related papers (2022-05-05T06:48:13Z)
Distill on the Go: Online knowledge distillation in self-supervised learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models. We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation. Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z)
Iterative Graph Self-Distillation [161.04351580382078]
We propose a novel unsupervised graph learning paradigm called Iterative Graph Self-Distillation (IGSD) IGSD iteratively performs the teacher-student distillation with graph augmentations. We show that we achieve significant and consistent performance gain on various graph datasets in both unsupervised and semi-supervised settings.
arXiv Detail & Related papers (2020-10-23T18:37:06Z)
Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network. We show that the seemingly different self-supervision task can serve as a simple yet powerful solution. By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
Distilling Knowledge from Graph Convolutional Networks [146.71503336770886]
Existing knowledge distillation methods focus on convolutional neural networks (CNNs) We propose the first dedicated approach to distilling knowledge from a pre-trained graph convolutional network (GCN) model. We show that our method achieves the state-of-the-art knowledge distillation performance for GCN models.
arXiv Detail & Related papers (2020-03-23T18:23:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.