On effects of Knowledge Distillation on Transfer Learning
- URL: http://arxiv.org/abs/2210.09668v1
- Date: Tue, 18 Oct 2022 08:11:52 GMT
- Title: On effects of Knowledge Distillation on Transfer Learning
- Authors: Sushil Thapa
- Abstract summary: We propose a machine learning architecture we call TL+KD that combines knowledge distillation with transfer learning.
We show that using guidance and knowledge from a larger teacher network during fine-tuning, we can improve the student network to achieve better validation performances like accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation is a popular machine learning technique that aims to
transfer knowledge from a large 'teacher' network to a smaller 'student'
network and improve the student's performance by training it to emulate the
teacher. In recent years, there has been significant progress in novel
distillation techniques that push performance frontiers across multiple
problems and benchmarks. Most of the reported work focuses on achieving
state-of-the-art results on the specific problem. However, there has been a
significant gap in understanding the process and how it behaves under certain
training scenarios. Similarly, transfer learning (TL) is an effective technique
in training neural networks on a limited dataset faster by reusing
representations learned from a different but related problem. Despite its
effectiveness and popularity, there has not been much exploration of knowledge
distillation on transfer learning. In this thesis, we propose a machine
learning architecture we call TL+KD that combines knowledge distillation with
transfer learning; we then present a quantitative and qualitative comparison of
TL+KD with TL in the domain of image classification. Through this work, we show
that using guidance and knowledge from a larger teacher network during
fine-tuning, we can improve the student network to achieve better validation
performances like accuracy. We characterize the improvement in the validation
performance of the model using a variety of metrics beyond just accuracy
scores, and study its performance in scenarios such as input degradation.
Related papers
- Multi-Task Multi-Scale Contrastive Knowledge Distillation for Efficient Medical Image Segmentation [0.0]
This thesis aims to investigate the feasibility of knowledge transfer between neural networks for medical image segmentation tasks.
In the context of medical imaging, where the data volumes are often limited, leveraging knowledge from a larger pre-trained network could be useful.
arXiv Detail & Related papers (2024-06-05T12:06:04Z) - Review helps learn better: Temporal Supervised Knowledge Distillation [9.220654594406508]
We find that during the network training, the evolution of feature map follows temporal sequence property.
Inspired by this observation, we propose Temporal Supervised Knowledge Distillation Review (TSKD)
arXiv Detail & Related papers (2023-07-03T07:51:08Z) - Knowledge Distillation via Token-level Relationship Graph [12.356770685214498]
We propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG)
By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model.
We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-20T08:16:37Z) - Learning Knowledge Representation with Meta Knowledge Distillation for
Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task.
Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z) - Semi-Supervising Learning, Transfer Learning, and Knowledge Distillation
with SimCLR [2.578242050187029]
Recent breakthroughs in the field of semi-supervised learning have achieved results that match state-of-the-art traditional supervised learning methods.
SimCLR is the current state-of-the-art semi-supervised learning framework for computer vision.
arXiv Detail & Related papers (2021-08-02T01:37:39Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Point Adversarial Self Mining: A Simple Method for Facial Expression
Recognition [79.75964372862279]
We propose Point Adversarial Self Mining (PASM) to improve the recognition accuracy in facial expression recognition.
PASM uses a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task.
The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively.
arXiv Detail & Related papers (2020-08-26T06:39:24Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z) - Heterogeneous Knowledge Distillation using Information Flow Modeling [82.83891707250926]
We propose a novel KD method that works by modeling the information flow through the various layers of the teacher model.
The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process.
arXiv Detail & Related papers (2020-05-02T06:56:56Z) - Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep
Character Recognition [2.320417845168326]
Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models.
The technique of pre-training on one task and then retraining on a new one is called transfer learning.
In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks.
arXiv Detail & Related papers (2020-01-02T14:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.