Knowledge Distillation for Adaptive MRI Prostate Segmentation Based on
Limit-Trained Multi-Teacher Models
- URL: http://arxiv.org/abs/2303.09494v1
- Date: Thu, 16 Mar 2023 17:15:08 GMT
- Title: Knowledge Distillation for Adaptive MRI Prostate Segmentation Based on
Limit-Trained Multi-Teacher Models
- Authors: Eddardaa Ben Loussaief, Hatem Rashwan, Mohammed Ayad, Mohammed Zakaria
Hassan, and Domenec Puig
- Abstract summary: Knowledge Distillation (KD) has been proposed as a compression method and an acceleration technology.
KD is an efficient learning strategy that can transfer knowledge from a burdensome model to a lightweight model.
We develop a KD-based deep model for prostate MRI segmentation in this work by combining features-based distillation with Kullback-Leibler divergence, Lovasz, and Dice losses.
- Score: 4.711401719735324
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With numerous medical tasks, the performance of deep models has recently
experienced considerable improvements. These models are often adept learners.
Yet, their intricate architectural design and high computational complexity
make deploying them in clinical settings challenging, particularly with devices
with limited resources. To deal with this issue, Knowledge Distillation (KD)
has been proposed as a compression method and an acceleration technology. KD is
an efficient learning strategy that can transfer knowledge from a burdensome
model (i.e., teacher model) to a lightweight model (i.e., student model). Hence
we can obtain a compact model with low parameters with preserving the teacher's
performance. Therefore, we develop a KD-based deep model for prostate MRI
segmentation in this work by combining features-based distillation with
Kullback-Leibler divergence, Lovasz, and Dice losses. We further demonstrate
its effectiveness by applying two compression procedures: 1) distilling
knowledge to a student model from a single well-trained teacher, and 2) since
most of the medical applications have a small dataset, we train multiple
teachers that each one trained with a small set of images to learn an adaptive
student model as close to the teachers as possible considering the desired
accuracy and fast inference time. Extensive experiments were conducted on a
public multi-site prostate tumor dataset, showing that the proposed adaptation
KD strategy improves the dice similarity score by 9%, outperforming all tested
well-established baseline models.
Related papers
- Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - DistiLLM: Towards Streamlined Distillation for Large Language Models [53.46759297929675]
DistiLLM is a more effective and efficient KD framework for auto-regressive language models.
DisiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs.
arXiv Detail & Related papers (2024-02-06T11:10:35Z) - Knowledge Distillation with Representative Teacher Keys Based on
Attention Mechanism for Image Classification Model Compression [1.503974529275767]
knowledge distillation (KD) has been recognized as one of the effective method of model compression to decrease the model parameters.
Inspired by attention mechanism, we propose a novel KD method called representative teacher key (RTK)
Our proposed RTK can effectively improve the classification accuracy of the state-of-the-art attention-based KD method.
arXiv Detail & Related papers (2022-06-26T05:08:50Z) - SSD-KD: A Self-supervised Diverse Knowledge Distillation Method for
Lightweight Skin Lesion Classification Using Dermoscopic Images [62.60956024215873]
Skin cancer is one of the most common types of malignancy, affecting a large population and causing a heavy economic burden worldwide.
Most studies in skin cancer detection keep pursuing high prediction accuracies without considering the limitation of computing resources on portable devices.
This study specifically proposes a novel method, termed SSD-KD, that unifies diverse knowledge into a generic KD framework for skin diseases classification.
arXiv Detail & Related papers (2022-03-22T06:54:29Z) - How and When Adversarial Robustness Transfers in Knowledge Distillation? [137.11016173468457]
This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in Knowledge distillation (KD)
We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy.
Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model.
arXiv Detail & Related papers (2021-10-22T21:30:53Z) - Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression.
Current methods assign a fixed weight to a teacher model in the whole distillation.
Most of the existing methods allocate an equal weight to every teacher model.
In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z) - Ensemble Knowledge Distillation for CTR Prediction [46.92149090885551]
We propose a new model training strategy based on knowledge distillation (KD)
KD is a teacher-student learning framework to transfer knowledge learned from a teacher model to a student model.
We propose some novel techniques to facilitate ensembled CTR prediction, including teacher gating and early stopping by distillation loss.
arXiv Detail & Related papers (2020-11-08T23:37:58Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - Pea-KD: Parameter-efficient and Accurate Knowledge Distillation on BERT [20.732095457775138]
Knowledge Distillation (KD) is one of the widely known methods for model compression.
Pea-KD consists of two main parts: Shuffled Sharing (SPS) and Pretraining with Teacher's Predictions (PTP)
arXiv Detail & Related papers (2020-09-30T17:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.