Self-Knowledge Distillation for Surgical Phase Recognition
- URL: http://arxiv.org/abs/2306.08961v1
- Date: Thu, 15 Jun 2023 08:55:00 GMT
- Title: Self-Knowledge Distillation for Surgical Phase Recognition
- Authors: Jinglu Zhang, Santiago Barbarisi, Abdolrahim Kadkhodamohammadi, Danail
Stoyanov, Imanol Luengo
- Abstract summary: We propose a self-knowledge distillation framework that can be integrated into current state-of-the-art (SOTA) models.
Our framework is embedded on top of four popular SOTA approaches and consistently improves their performance.
- Score: 8.708027525926193
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Purpose: Advances in surgical phase recognition are generally led by training
deeper networks. Rather than going further with a more complex solution, we
believe that current models can be exploited better. We propose a
self-knowledge distillation framework that can be integrated into current
state-of-the-art (SOTA) models without requiring any extra complexity to the
models or annotations.
Methods: Knowledge distillation is a framework for network regularization
where knowledge is distilled from a teacher network to a student network. In
self-knowledge distillation, the student model becomes the teacher such that
the network learns from itself. Most phase recognition models follow an
encoder-decoder framework. Our framework utilizes self-knowledge distillation
in both stages. The teacher model guides the training process of the student
model to extract enhanced feature representations from the encoder and build a
more robust temporal decoder to tackle the over-segmentation problem.
Results: We validate our proposed framework on the public dataset Cholec80.
Our framework is embedded on top of four popular SOTA approaches and
consistently improves their performance. Specifically, our best GRU model
boosts performance by +3.33% accuracy and +3.95% F1-score over the same
baseline model.
Conclusion: We embed a self-knowledge distillation framework for the first
time in the surgical phase recognition training pipeline. Experimental results
demonstrate that our simple yet powerful framework can improve performance of
existing phase recognition models. Moreover, our extensive experiments show
that even with 75% of the training set we still achieve performance on par with
the same baseline model trained on the full set.
Related papers
- Generative Model-based Feature Knowledge Distillation for Action
Recognition [11.31068233536815]
Our paper introduces an innovative knowledge distillation framework, with the generative model for training a lightweight student model.
The efficacy of our approach is demonstrated through comprehensive experiments on diverse popular datasets.
arXiv Detail & Related papers (2023-12-14T03:55:29Z) - Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - Distilling Knowledge from Self-Supervised Teacher by Embedding Graph
Alignment [52.704331909850026]
We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network.
Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space.
Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
arXiv Detail & Related papers (2022-11-23T19:27:48Z) - Alignahead: Online Cross-Layer Knowledge Extraction on Graph Neural
Networks [6.8080936803807734]
Existing knowledge distillation methods on graph neural networks (GNNs) are almost offline.
We propose a novel online knowledge distillation framework to resolve this problem.
We develop a cross-layer distillation strategy by aligning ahead one student layer with the layer in different depth of another student model.
arXiv Detail & Related papers (2022-05-05T06:48:13Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - Learning by Distillation: A Self-Supervised Learning Framework for
Optical Flow Estimation [71.76008290101214]
DistillFlow is a knowledge distillation approach to learning optical flow.
It achieves state-of-the-art unsupervised learning performance on both KITTI and Sintel datasets.
Our models ranked 1st among all monocular methods on the KITTI 2015 benchmark, and outperform all published methods on the Sintel Final benchmark.
arXiv Detail & Related papers (2021-06-08T09:13:34Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z) - Beyond Self-Supervision: A Simple Yet Effective Network Distillation
Alternative to Improve Backbones [40.33419553042038]
We propose to improve existing baseline networks via knowledge distillation from off-the-shelf pre-trained big powerful models.
Our solution performs distillation by only driving prediction of the student model consistent with that of the teacher model.
We empirically find that such simple distillation settings perform extremely effective, for example, the top-1 accuracy on ImageNet-1k validation set of MobileNetV3-large and ResNet50-D can be significantly improved.
arXiv Detail & Related papers (2021-03-10T09:32:44Z) - Towards Understanding Ensemble, Knowledge Distillation and
Self-Distillation in Deep Learning [93.18238573921629]
We study how Ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model.
We show that ensemble/knowledge distillation in deep learning works very differently from traditional learning theory.
We prove that self-distillation can also be viewed as implicitly combining ensemble and knowledge distillation to improve test accuracy.
arXiv Detail & Related papers (2020-12-17T18:34:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.