Robust feature knowledge distillation for enhanced performance of lightweight crack segmentation models
- URL: http://arxiv.org/abs/2404.06258v1
- Date: Tue, 9 Apr 2024 12:32:10 GMT
- Title: Robust feature knowledge distillation for enhanced performance of lightweight crack segmentation models
- Authors: Zhaohui Chen, Elyas Asadi Shamsabadi, Sheng Jiang, Luming Shen, Daniel Dias-da-Costa,
- Abstract summary: This paper develops a framework to improve robustness while retaining the precision of light models for crack segmentation.
RFKD distils knowledge from a teacher model's logit layers and intermediate feature maps while leveraging mixed clean and noisy images.
Results show a significant enhancement in noisy images, with RFKD reaching a 62% enhanced mean Dice score (mDS) compared to SOTA KD methods.
- Score: 2.023914201416672
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Vision-based crack detection faces deployment challenges due to the size of robust models and edge device limitations. These can be addressed with lightweight models trained with knowledge distillation (KD). However, state-of-the-art (SOTA) KD methods compromise anti-noise robustness. This paper develops Robust Feature Knowledge Distillation (RFKD), a framework to improve robustness while retaining the precision of light models for crack segmentation. RFKD distils knowledge from a teacher model's logit layers and intermediate feature maps while leveraging mixed clean and noisy images to transfer robust patterns to the student model, improving its precision, generalisation, and anti-noise performance. To validate the proposed RFKD, a lightweight crack segmentation model, PoolingCrack Tiny (PCT), with only 0.5 M parameters, is also designed and used as the student to run the framework. The results show a significant enhancement in noisy images, with RFKD reaching a 62% enhanced mean Dice score (mDS) compared to SOTA KD methods.
Related papers
- Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration [17.27061613884289]
We propose a novel dynamic contrastive knowledge distillation (DCKD) framework for image restoration.
Specifically, we introduce dynamic contrastive regularization to perceive the student's learning state.
We also propose a distribution mapping module to extract and align the pixel-level category distribution of the teacher and student models.
arXiv Detail & Related papers (2024-12-12T05:01:17Z) - Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness [56.2479170374811]
We introduce Fine-Tuning with Confidence-Aware Denoised Image Selection (FT-CADIS)
FT-CADIS is inspired by the observation that the confidence of off-the-shelf classifiers can effectively identify hallucinated images during denoised smoothing.
It has established the state-of-the-art certified robustness among denoised smoothing methods across all $ell$-adversary radius in various benchmarks.
arXiv Detail & Related papers (2024-11-13T09:13:20Z) - Stable Consistency Tuning: Understanding and Improving Consistency Models [40.2712218203989]
Diffusion models achieve superior generation quality but suffer from slow generation speed due to iterative nature of denoising.
consistency models, a new generative family, achieve competitive performance with significantly faster sampling.
We propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference(TD) Learning.
arXiv Detail & Related papers (2024-10-24T17:55:52Z) - DistiLLM: Towards Streamlined Distillation for Large Language Models [53.46759297929675]
DistiLLM is a more effective and efficient KD framework for auto-regressive language models.
DisiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs.
arXiv Detail & Related papers (2024-02-06T11:10:35Z) - Co-training and Co-distillation for Quality Improvement and Compression
of Language Models [88.94539115180919]
Knowledge Distillation (KD) compresses expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models.
Most smaller models fail to surpass the performance of the original larger model, resulting in sacrificing performance to improve inference speed.
We propose Co-Training and Co-Distillation (CTCD), a novel framework that improves performance and inference speed together by co-training two models.
arXiv Detail & Related papers (2023-11-06T03:29:00Z) - Knowledge Distillation Performs Partial Variance Reduction [93.6365393721122]
Knowledge distillation is a popular approach for enhancing the performance of ''student'' models.
The underlying mechanics behind knowledge distillation (KD) are still not fully understood.
We show that KD can be interpreted as a novel type of variance reduction mechanism.
arXiv Detail & Related papers (2023-05-27T21:25:55Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - BD-KD: Balancing the Divergences for Online Knowledge Distillation [11.874952582465601]
We introduce BD-KD (Balanced Divergence Knowledge Distillation), a framework for logit-based online KD.
BD-KD enhances both accuracy and model calibration simultaneously, eliminating the need for post-hoc recalibration techniques.
Our method encourages student-centered training by adjusting the conventional online distillation loss on both the student and teacher losses.
arXiv Detail & Related papers (2022-12-25T22:27:32Z) - Deep Learning-Based Defect Classification and Detection in SEM Images [1.9206693386750882]
In particular, we train RetinaNet models using different ResNet, VGGNet architectures as backbone.
We propose a preference-based ensemble strategy to combine the output predictions from different models in order to achieve better performance on classification and detection of defects.
arXiv Detail & Related papers (2022-06-20T16:34:11Z) - CEKD:Cross Ensemble Knowledge Distillation for Augmented Fine-grained
Data [7.012047150376948]
The proposed model can be trained in an end-to-end manner, and only requires image-level label supervision.
With the backbone of ResNet-101, CEKD obtains the accuracy of 89.59%, 95.96% and 94.56% in three datasets respectively.
arXiv Detail & Related papers (2022-03-13T02:57:25Z) - How and When Adversarial Robustness Transfers in Knowledge Distillation? [137.11016173468457]
This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in Knowledge distillation (KD)
We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy.
Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model.
arXiv Detail & Related papers (2021-10-22T21:30:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.