ACT-Net: Asymmetric Co-Teacher Network for Semi-supervised
Memory-efficient Medical Image Segmentation
- URL: http://arxiv.org/abs/2207.01900v1
- Date: Tue, 5 Jul 2022 08:58:15 GMT
- Title: ACT-Net: Asymmetric Co-Teacher Network for Semi-supervised
Memory-efficient Medical Image Segmentation
- Authors: Ziyuan Zhao, Andong Zhu, Zeng Zeng, Bharadwaj Veeravalli, Cuntai Guan
- Abstract summary: High-accuracy deep models usually come in large model sizes, limiting their employment in real scenarios.
We propose a novel asymmetric co-teacher framework, ACT-Net, to alleviate the burden on both expensive annotations and computational costs for semi-student knowledge distillation.
- Score: 19.528360162691342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep models have shown promising performance in medical image
segmentation, they heavily rely on a large amount of well-annotated data, which
is difficult to access, especially in clinical practice. On the other hand,
high-accuracy deep models usually come in large model sizes, limiting their
employment in real scenarios. In this work, we propose a novel asymmetric
co-teacher framework, ACT-Net, to alleviate the burden on both expensive
annotations and computational costs for semi-supervised knowledge distillation.
We advance teacher-student learning with a co-teacher network to facilitate
asymmetric knowledge distillation from large models to small ones by
alternating student and teacher roles, obtaining tiny but accurate models for
clinical employment. To verify the effectiveness of our ACT-Net, we employ the
ACDC dataset for cardiac substructure segmentation in our experiments.
Extensive experimental results demonstrate that ACT-Net outperforms other
knowledge distillation methods and achieves lossless segmentation performance
with 250x fewer parameters.
Related papers
- Few-Shot Airway-Tree Modeling using Data-Driven Sparse Priors [0.0]
Few-shot learning approaches are cost-effective to transfer pre-trained models using only limited annotated data.
We train a data-driven sparsification module to enhance airways efficiently in lung CT scans.
We then incorporate these sparse representations in a standard supervised segmentation pipeline as a pretraining step to enhance the performance of the DL models.
arXiv Detail & Related papers (2024-07-05T13:46:11Z) - Multi-Task Multi-Scale Contrastive Knowledge Distillation for Efficient Medical Image Segmentation [0.0]
This thesis aims to investigate the feasibility of knowledge transfer between neural networks for medical image segmentation tasks.
In the context of medical imaging, where the data volumes are often limited, leveraging knowledge from a larger pre-trained network could be useful.
arXiv Detail & Related papers (2024-06-05T12:06:04Z) - TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation [6.856317526681759]
Visual place recognition plays a pivotal role in autonomous exploration and navigation of mobile robots.
Existing methods overcome this by exploiting powerful yet large networks.
We propose a high-performance teacher and lightweight student distillation framework called TSCM.
arXiv Detail & Related papers (2024-04-02T02:29:41Z) - Knowledge Distillation for Adaptive MRI Prostate Segmentation Based on
Limit-Trained Multi-Teacher Models [4.711401719735324]
Knowledge Distillation (KD) has been proposed as a compression method and an acceleration technology.
KD is an efficient learning strategy that can transfer knowledge from a burdensome model to a lightweight model.
We develop a KD-based deep model for prostate MRI segmentation in this work by combining features-based distillation with Kullback-Leibler divergence, Lovasz, and Dice losses.
arXiv Detail & Related papers (2023-03-16T17:15:08Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - SECP-Net: SE-Connection Pyramid Network of Organ At Risk Segmentation
for Nasopharyngeal Carcinoma [0.0]
Deep learning models have been widely applied in medical image segmentation tasks.
Traditional deep neural networks underperform during segmentation due to the lack use of global and multi-size information.
This paper proposes a new SE-Connection Pyramid Network (SECP-Net) for improving the segmentation performance.
arXiv Detail & Related papers (2021-12-28T07:48:18Z) - Knowledge distillation: A good teacher is patient and consistent [71.14922743774864]
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications.
We identify certain implicit design choices, which may drastically affect the effectiveness of distillation.
We obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.
arXiv Detail & Related papers (2021-06-09T17:20:40Z) - DisCo: Remedy Self-supervised Learning on Lightweight Models with
Distilled Contrastive Learning [94.89221799550593]
Self-supervised representation learning (SSL) has received widespread attention from the community.
Recent research argue that its performance will suffer a cliff fall when the model size decreases.
We propose a simple yet effective Distilled Contrastive Learning (DisCo) to ease the issue by a large margin.
arXiv Detail & Related papers (2021-04-19T08:22:52Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - Contrastive Distillation on Intermediate Representations for Language
Model Compression [89.31786191358802]
We propose Contrastive Distillation on Intermediate Representations (CoDIR) as a principled knowledge distillation framework.
By learning to distinguish positive sample from a large set of negative samples, CoDIR facilitates the student's exploitation of rich information in teacher's hidden layers.
CoDIR can be readily applied to compress large-scale language models in both pre-training and finetuning stages, and achieves superb performance on the GLUE benchmark.
arXiv Detail & Related papers (2020-09-29T17:31:43Z) - Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner.
We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.