Normalized Feature Distillation for Semantic Segmentation
        - URL: http://arxiv.org/abs/2207.05256v1
- Date: Tue, 12 Jul 2022 01:54:25 GMT
- Title: Normalized Feature Distillation for Semantic Segmentation
- Authors: Tao Liu, Xi Yang, Chenshu Chen
- Abstract summary: We propose a simple yet effective feature distillation method called normalized feature distillation (NFD)
Our method achieves state-of-the-art distillation results for semantic segmentation on Cityscapes, VOC 2012, and ADE20K datasets.
- Score: 6.882655287146012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   As a promising approach in model compression, knowledge distillation improves
the performance of a compact model by transferring the knowledge from a
cumbersome one. The kind of knowledge used to guide the training of the student
is important. Previous distillation methods in semantic segmentation strive to
extract various forms of knowledge from the features, which involve elaborate
manual design relying on prior information and have limited performance gains.
In this paper, we propose a simple yet effective feature distillation method
called normalized feature distillation (NFD), aiming to enable effective
distillation with the original features without the need to manually design new
forms of knowledge. The key idea is to prevent the student from focusing on
imitating the magnitude of the teacher's feature response by normalization. Our
method achieves state-of-the-art distillation results for semantic segmentation
on Cityscapes, VOC 2012, and ADE20K datasets. Code will be available.
 
      
        Related papers
        - Learning from Stochastic Teacher Representations Using Student-Guided   Knowledge Distillation [64.15918654558816]
 Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only.
 Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
 arXiv  Detail & Related papers  (2025-04-19T14:08:56Z)
- Delving Deep into Semantic Relation Distillation [40.89593967999198]
 This paper introduces a novel methodology, Semantics-based Relation Knowledge Distillation (SeRKD)
SeRKD reimagines knowledge distillation through a semantics-relation lens among each sample.
It integrates superpixel-based semantic extraction with relation-based knowledge distillation for a sophisticated model compression and distillation.
 arXiv  Detail & Related papers  (2025-03-27T08:50:40Z)
- Efficient Knowledge Injection in LLMs via Self-Distillation [50.24554628642021]
 This paper proposes utilizing prompt distillation to internalize new factual knowledge from free-form documents.<n>We show that prompt distillation outperforms standard supervised fine-tuning and can even surpass RAG.
 arXiv  Detail & Related papers  (2024-12-19T15:44:01Z)
- AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss   Weighting [5.818420448447701]
 We propose Adaptive Knowledge Distillation, a novel technique inspired by curriculum learning to adaptively weigh the losses at instance level.
Our method follows a plug-and-play paradigm that can be applied on top of any task-specific and distillation objectives.
 arXiv  Detail & Related papers  (2024-05-11T15:06:24Z)
- The Staged Knowledge Distillation in Video Classification: Harmonizing
  Student Progress by a Complementary Weakly Supervised Framework [21.494759678807686]
 We propose a new weakly supervised learning framework for knowledge distillation in video classification.
Our approach leverages the concept of substage-based learning to distill knowledge based on the combination of student substages and the correlation of corresponding substages.
Our proposed substage-based distillation approach has the potential to inform future research on label-efficient learning for video data.
 arXiv  Detail & Related papers  (2023-07-11T12:10:42Z)
- Knowledge Diffusion for Distillation [53.908314960324915]
 The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
 arXiv  Detail & Related papers  (2023-05-25T04:49:34Z)
- HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
  Transformers [49.79405257763856]
 This paper focuses on task-agnostic distillation.
It produces a compact pre-trained model that can be easily fine-tuned on various tasks with small computational costs and memory footprints.
We propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning.
 arXiv  Detail & Related papers  (2023-02-19T17:37:24Z)
- Exploring Inconsistent Knowledge Distillation for Object Detection with
  Data Augmentation [66.25738680429463]
 Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
 arXiv  Detail & Related papers  (2022-09-20T16:36:28Z)
- FAKD: Feature Augmented Knowledge Distillation for Semantic Segmentation [17.294737459735675]
 We explore data augmentations for knowledge distillation on semantic segmentation.
Inspired by the recent progress on semantic directions on feature-space, we propose to include augmentations in feature space for efficient distillation.
 arXiv  Detail & Related papers  (2022-08-30T10:55:31Z)
- Mind the Gap in Distilling StyleGANs [100.58444291751015]
 StyleGAN family is one of the most popular Generative Adversarial Networks (GANs) for unconditional generation.
This paper provides a comprehensive study of distilling from the popular StyleGAN-like architecture.
 arXiv  Detail & Related papers  (2022-08-18T14:18:29Z)
- Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
 We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student.
At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL)
Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
 arXiv  Detail & Related papers  (2022-05-13T15:15:27Z)
- Delta Distillation for Efficient Video Processing [68.81730245303591]
 We propose a novel knowledge distillation schema coined as Delta Distillation.
We demonstrate that these temporal variations can be effectively distilled due to the temporal redundancies within video frames.
As a by-product, delta distillation improves the temporal consistency of the teacher model.
 arXiv  Detail & Related papers  (2022-03-17T20:13:30Z)
- Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge
  Distillation [12.097302014936655]
 This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD)
Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation.
We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets.
 arXiv  Detail & Related papers  (2021-03-15T10:59:43Z)
- Self-Feature Regularization: Self-Feature Distillation Without Teacher
  Models [0.0]
 Self-Feature Regularization(SFR) is proposed, which uses features in the deep layers to supervise feature learning in the shallow layers.
We firstly use generalization-l2 loss to match local features and a many-to-one approach to distill more intensively in the channel dimension.
 arXiv  Detail & Related papers  (2021-03-12T15:29:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.