Generative Denoise Distillation: Simple Stochastic Noises Induce
Efficient Knowledge Transfer for Dense Prediction
- URL: http://arxiv.org/abs/2401.08332v2
- Date: Wed, 17 Jan 2024 07:18:11 GMT
- Title: Generative Denoise Distillation: Simple Stochastic Noises Induce
Efficient Knowledge Transfer for Dense Prediction
- Authors: Zhaoge Liu, Xiaohao Xu, Yunkang Cao, Weiming Shen
- Abstract summary: We propose an innovative method, Generative Denoise Distillation (GDD), to transfer knowledge from a teacher to a student.
GDD embeds semantic noises into the concept feature of the student to embed them into the generated instance feature from a shallow network.
We extensively experiment with object detection, instance segmentation, and semantic segmentation to demonstrate the versatility and effectiveness of our method.
- Score: 3.2976453916809803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation is the process of transferring knowledge from a more
powerful large model (teacher) to a simpler counterpart (student). Numerous
current approaches involve the student imitating the knowledge of the teacher
directly. However, redundancy still exists in the learned representations
through these prevalent methods, which tend to learn each spatial location's
features indiscriminately. To derive a more compact representation (concept
feature) from the teacher, inspired by human cognition, we suggest an
innovative method, termed Generative Denoise Distillation (GDD), where
stochastic noises are added to the concept feature of the student to embed them
into the generated instance feature from a shallow network. Then, the generated
instance feature is aligned with the knowledge of the instance from the
teacher. We extensively experiment with object detection, instance
segmentation, and semantic segmentation to demonstrate the versatility and
effectiveness of our method. Notably, GDD achieves new state-of-the-art
performance in the tasks mentioned above. We have achieved substantial
improvements in semantic segmentation by enhancing PspNet and DeepLabV3, both
of which are based on ResNet-18, resulting in mIoU scores of 74.67 and 77.69,
respectively, surpassing their previous scores of 69.85 and 73.20 on the
Cityscapes dataset of 20 categories. The source code is available at
https://github.com/ZhgLiu/GDD.
Related papers
- I2CKD : Intra- and Inter-Class Knowledge Distillation for Semantic Segmentation [1.433758865948252]
This paper proposes a new knowledge distillation method tailored for image semantic segmentation, termed Intra- and Inter-Class Knowledge Distillation (I2CKD)
The focus of this method is on capturing and transferring knowledge between the intermediate layers of teacher (cumbersome model) and student (compact model)
arXiv Detail & Related papers (2024-03-27T12:05:22Z) - Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient
Semantic Segmentation [16.957139277317005]
Augmentation-free Dense Contrastive Knowledge Distillation (Af-DCD) is a new contrastive distillation learning paradigm.
Af-DCD trains compact and accurate deep neural networks for semantic segmentation applications.
arXiv Detail & Related papers (2023-12-07T09:37:28Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student.
At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL)
Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z) - Energy-based Latent Aligner for Incremental Learning [83.0135278697976]
Deep learning models tend to forget their earlier knowledge while incrementally learning new tasks.
This behavior emerges because the parameter updates optimized for the new tasks may not align well with the updates suitable for older tasks.
We propose ELI: Energy-based Latent Aligner for Incremental Learning.
arXiv Detail & Related papers (2022-03-28T17:57:25Z) - Deep Structured Instance Graph for Distilling Object Detectors [82.16270736573176]
We present a simple knowledge structure to exploit and encode information inside the detection system to facilitate detector knowledge distillation.
We achieve new state-of-the-art results on the challenging COCO object detection task with diverse student-teacher pairs on both one- and two-stage detectors.
arXiv Detail & Related papers (2021-09-27T08:26:00Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.