Rethinking Intermediate Layers design in Knowledge Distillation for Kidney and Liver Tumor Segmentation
- URL: http://arxiv.org/abs/2311.16700v2
- Date: Mon, 27 May 2024 12:27:16 GMT
- Title: Rethinking Intermediate Layers design in Knowledge Distillation for Kidney and Liver Tumor Segmentation
- Authors: Vandan Gorade, Sparsh Mittal, Debesh Jha, Ulas Bagci,
- Abstract summary: We propose Hierarchical Layer-selective Feedback Distillation (HLFD) for medical imaging tasks.
HLFD strategically distills knowledge from a combination of middle layers to earlier layers and transfers final layer knowledge to intermediate layers at both the feature and pixel levels.
In the kidney segmentation task, HLFD surpasses the student model (without KD) by over 10%, significantly improving its focus on tumor-specific features.
- Score: 4.252489463601601
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation (KD) has demonstrated remarkable success across various domains, but its application to medical imaging tasks, such as kidney and liver tumor segmentation, has encountered challenges. Many existing KD methods are not specifically tailored for these tasks. Moreover, prevalent KD methods often lack a careful consideration of `what' and `from where' to distill knowledge from the teacher to the student. This oversight may lead to issues like the accumulation of training bias within shallower student layers, potentially compromising the effectiveness of KD. To address these challenges, we propose Hierarchical Layer-selective Feedback Distillation (HLFD). HLFD strategically distills knowledge from a combination of middle layers to earlier layers and transfers final layer knowledge to intermediate layers at both the feature and pixel levels. This design allows the model to learn higher-quality representations from earlier layers, resulting in a robust and compact student model. Extensive quantitative evaluations reveal that HLFD outperforms existing methods by a significant margin. For example, in the kidney segmentation task, HLFD surpasses the student model (without KD) by over 10\%, significantly improving its focus on tumor-specific features. From a qualitative standpoint, the student model trained using HLFD excels at suppressing irrelevant information and can focus sharply on tumor-specific details, which opens a new pathway for more efficient and accurate diagnostic tools. Code is available \href{https://github.com/vangorade/RethinkingKD_ISBI24}{here}.
Related papers
- FANCL: Feature-Guided Attention Network with Curriculum Learning for Brain Metastases Segmentation [5.922836741521003]
Methods based on deep convolutional neural networks (CNNs) have achieved high segmentation performance.
Due to the loss of critical feature information caused by convolutional and pooling operations, CNNs still face great challenges in small BMs segmentation.
This paper proposes a novel model called feature-guided attention network with curriculum learning (FANCL)
arXiv Detail & Related papers (2024-10-29T14:08:39Z) - Relative Difficulty Distillation for Semantic Segmentation [54.76143187709987]
We propose a pixel-level KD paradigm for semantic segmentation named Relative Difficulty Distillation (RDD)
RDD allows the teacher network to provide effective guidance on learning focus without additional optimization goals.
Our research showcases that RDD can integrate with existing KD methods to improve their upper performance bound.
arXiv Detail & Related papers (2024-07-04T08:08:25Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - Gradient-Guided Knowledge Distillation for Object Detectors [3.236217153362305]
We propose a novel approach for knowledge distillation in object detection, named Gradient-guided Knowledge Distillation (GKD)
Our GKD uses gradient information to identify and assign more weights to features that significantly impact the detection loss, allowing the student to learn the most relevant features from the teacher.
Experiments on the KITTI and COCO-Traffic datasets demonstrate our method's efficacy in knowledge distillation for object detection.
arXiv Detail & Related papers (2023-03-07T21:09:09Z) - SSD-KD: A Self-supervised Diverse Knowledge Distillation Method for
Lightweight Skin Lesion Classification Using Dermoscopic Images [62.60956024215873]
Skin cancer is one of the most common types of malignancy, affecting a large population and causing a heavy economic burden worldwide.
Most studies in skin cancer detection keep pursuing high prediction accuracies without considering the limitation of computing resources on portable devices.
This study specifically proposes a novel method, termed SSD-KD, that unifies diverse knowledge into a generic KD framework for skin diseases classification.
arXiv Detail & Related papers (2022-03-22T06:54:29Z) - Knowledge Distillation with Deep Supervision [6.8080936803807734]
We propose Deeply-Supervised Knowledge Distillation (DSKD), which fully utilizes class predictions and feature maps of the teacher model to supervise the training of shallow student layers.
A loss-based weight allocation strategy is developed in DSKD to adaptively balance the learning process of each shallow layer, so as to further improve the student performance.
arXiv Detail & Related papers (2022-02-16T03:58:21Z) - RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation [24.951887361152988]
We propose a RAndom Intermediate Layer Knowledge Distillation (RAIL-KD) approach in which, intermediate layers from the teacher model are selected randomly to be distilled into the intermediate layers of the student model.
We show that our proposed RAIL-KD approach outperforms other state-of-the-art intermediate layer KD methods considerably in both performance and training-time.
arXiv Detail & Related papers (2021-09-21T13:21:13Z) - Categorical Relation-Preserving Contrastive Knowledge Distillation for
Medical Image Classification [75.27973258196934]
We propose a novel Categorical Relation-preserving Contrastive Knowledge Distillation (CRCKD) algorithm, which takes the commonly used mean-teacher model as the supervisor.
With this regularization, the feature distribution of the student model shows higher intra-class similarity and inter-class variance.
With the contribution of the CCD and CRP, our CRCKD algorithm can distill the relational knowledge more comprehensively.
arXiv Detail & Related papers (2021-07-07T13:56:38Z) - Heterogeneous Knowledge Distillation using Information Flow Modeling [82.83891707250926]
We propose a novel KD method that works by modeling the information flow through the various layers of the teacher model.
The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process.
arXiv Detail & Related papers (2020-05-02T06:56:56Z) - Residual Knowledge Distillation [96.18815134719975]
This work proposes Residual Knowledge Distillation (RKD), which further distills the knowledge by introducing an assistant (A)
In this way, S is trained to mimic the feature maps of T, and A aids this process by learning the residual error between them.
Experiments show that our approach achieves appealing results on popular classification datasets, CIFAR-100 and ImageNet.
arXiv Detail & Related papers (2020-02-21T07:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.